Synthex User Guide

Welcome to the Synthex User Guide — your comprehensive resource for understanding and using Inwire's synthetic data generation and data management capabilities. Whether you're generating privacy-preserving training data, augmenting imbalanced datasets, or creating test fixtures, this guide will help you get the most out of Synthex.

Introduction to Synthex
Core Concepts
Getting Started
Working with Datasets
Data Profiles
Data Recipes
Synthetic Data Generation
Generator Configurations
Integration with Model Training
Example Scenarios
Best Practices
Limitations & Future Directions
API Reference

Introduction to Synthex

What is Synthex?

Synthex is Inwire's data management and synthetic data generation service. Think of it as the "data twin" of the Model Training service — while Model Training handles experiments and model development, Synthex handles everything related to data: ingestion, profiling, transformation, and generation.

Why Synthetic Data?

Synthetic data addresses several critical challenges in ML development:

Challenge	How Synthex Helps
Privacy Compliance	Generate data that preserves statistical properties without exposing real individuals
Data Scarcity	Create additional training samples when real data is limited
Class Imbalance	Generate samples for underrepresented classes
Testing & Development	Create realistic test data without production data access
Edge Cases	Generate specific scenarios that rarely occur in real data
Data Sharing	Share synthetic versions of sensitive datasets

Synthex in the ML Lifecycle

┌─────────────────────────────────────────────────────────────────────────────┐
│                           Data Flow in Inwire                                │
└─────────────────────────────────────────────────────────────────────────────┘

  External Sources              Synthex                    Model Training
  ───────────────              ─────────                   ──────────────
  ┌─────────┐                ┌───────────┐               ┌───────────────┐
  │  CSV    │───┐            │           │               │               │
  └─────────┘   │            │  Dataset  │               │   Experiment  │
  ┌─────────┐   │  Import    │  Catalog  │    Select     │     Setup     │
  │ Parquet │───┼───────────>│           │──────────────>│               │
  └─────────┘   │            │           │               │   - Dataset   │
  ┌─────────┐   │            └─────┬─────┘               │   - Version   │
  │   S3    │───┘                  │                     │   - Recipe    │
  └─────────┘                      │                     └───────────────┘
                                   │
                            ┌──────┴──────┐
                            │             │
                            ▼             ▼
                    ┌───────────┐  ┌───────────┐
                    │  Profile  │  │  Generate │
                    │  & Clean  │  │ Synthetic │
                    └───────────┘  └───────────┘

Core Concepts

Before diving into workflows, let's understand the key concepts in Synthex:

Datasets

A Dataset is a collection of data that you've imported into Synthex. Datasets are the foundation of all data operations.

Dataset Properties:

Name — Human-readable identifier
Type — Tabular, Text, Time Series, Image, etc.
Source — Where the data came from (upload, cloud, database)
Schema — Column definitions and data types
Statistics — Profiling results (distributions, nulls, outliers)
Tags — For organization and filtering

Dataset States:

┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐
│ Pending  │───>│Profiling │───>│  Ready   │───>│ Archived │
└──────────┘    └──────────┘    └──────────┘    └──────────┘
                                      │
                                      ▼
                               ┌──────────┐
                               │Processing│ (when recipe applied)
                               └──────────┘

Versions

Every dataset maintains a version history. Each transformation or generation creates a new version while preserving the original.

Dataset: customer_transactions
├── v1.0 (original import)
├── v1.1 (cleaned nulls)
├── v1.2 (normalized amounts)
└── v2.0 (synthetic augmentation)

Version Types:

Type	Description
Original	Raw imported data
Cleaned	After data cleaning recipes
Transformed	After transformation recipes
Synthetic	Generated synthetic data
Augmented	Original + synthetic combined

Data Profiles

A Data Profile defines the schema and statistical properties of a dataset. Profiles are used for:

Validating data quality
Guiding synthetic generation
Ensuring consistency across versions

Profile Components:

profile:
  name: customer_transactions
  columns:
    - name: customer_id
      type: string
      constraints:
        - unique
        - not_null

    - name: transaction_amount
      type: float
      statistics:
        min: 0.01
        max: 99999.99
        mean: 150.75
        distribution: log_normal

    - name: is_fraud
      type: boolean
      distribution:
        true: 0.01
        false: 0.99

Data Recipes

A Data Recipe is a reusable sequence of transformations that can be applied to datasets. Recipes ensure reproducibility and consistency.

Recipe Structure:

recipe:
  name: fraud_data_prep
  steps:
    - type: clean
      action: drop_nulls
      columns: [customer_id, transaction_amount]

    - type: transform
      action: normalize
      column: transaction_amount
      method: min_max

    - type: filter
      condition: "transaction_amount > 0"

    - type: encode
      column: category
      method: one_hot

Generator Configurations

A Generator Configuration defines how synthetic data is created. It specifies:

Method — Statistical, GAN, LLM, etc.
Source — Profile or existing dataset to learn from
Parameters — Method-specific settings
Output — Where to store generated data

Modalities

Synthex supports multiple data modalities:

Modality	Description	Generation Methods
Tabular	Structured rows and columns	Statistical, GAN, VAE
Text	Natural language data	LLM, GPT, T5
Time Series	Sequential temporal data	TimeGAN, Statistical
Image	Visual data	Diffusion, GAN
Graph	Network/relationship data	GraphVAE

Getting Started

Accessing Synthex

Log in to Inwire
Click Synthex in the sidebar
You'll see the Synthex dashboard:

┌─────────────────────────────────────────────────────────────────────────────┐
│  Synthex                                                    [+ New Dataset] │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐        │
│  │  Datasets   │  │  Profiles   │  │   Recipes   │  │    Jobs     │        │
│  │     47      │  │     23      │  │     15      │  │  3 Running  │        │
│  └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘        │
│                                                                              │
│  Recent Datasets                                              [View All]    │
│  ─────────────────────────────────────────────────────────────────────     │
│  │ Name                    │ Type      │ Rows    │ Status   │ Updated     │ │
│  ├─────────────────────────┼───────────┼─────────┼──────────┼─────────────┤ │
│  │ customer_transactions   │ Tabular   │ 100,000 │ Ready    │ 2 hours ago │ │
│  │ product_reviews         │ Text      │ 50,000  │ Ready    │ 1 day ago   │ │
│  │ sensor_readings         │ TimeSeries│ 1M      │ Profiling│ 5 min ago   │ │
│  └─────────────────────────┴───────────┴─────────┴──────────┴─────────────┘ │
│                                                                              │
│  Active Jobs                                                  [View All]    │
│  ─────────────────────────────────────────────────────────────────────     │
│  │ fraud_synthetic_gen    │ Generation │ ████████░░ 80%  │ ETA: 5 min   │ │
│  │ customer_profile       │ Profiling  │ ██████████ Done │ Complete     │ │
│  └────────────────────────┴────────────┴─────────────────┴──────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘

Section	Purpose
Datasets	Browse, import, and manage datasets
Profiles	Create and edit data profiles
Recipes	Build and manage transformation recipes
Generators	Configure synthetic data generators
Jobs	Monitor running and completed jobs
Settings	Configure Synthex preferences

Working with Datasets

Importing a Dataset

Step 1: Start Import

Go to Synthex → Datasets
Click Import Dataset (or + New Dataset)

Step 2: Choose Source

Select your data source:

Source	Description	Best For
File Upload	Upload from local machine	Small datasets, quick testing
Cloud Storage	Import from S3, GCS, Azure	Large datasets, production data
Database	Direct database query	Live data, scheduled imports
URL	Fetch from HTTP endpoint	Public datasets, APIs

File Upload Example:

┌─────────────────────────────────────────────────────────────────┐
│  Import Dataset                                                  │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Source: [●] File Upload  [ ] Cloud  [ ] Database  [ ] URL     │
│                                                                  │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │                                                          │   │
│  │           Drag and drop files here                      │   │
│  │                 or click to browse                      │   │
│  │                                                          │   │
│  │           Supported: CSV, Parquet, JSON, JSONL          │   │
│  │                                                          │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                  │
│  Selected: transactions_2024.csv (45.2 MB)                      │
│                                                                  │
│                                              [Cancel] [Next →]  │
└─────────────────────────────────────────────────────────────────┘

Cloud Storage Example (S3):

┌─────────────────────────────────────────────────────────────────┐
│  Import from Cloud Storage                                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Integration: [Production S3            ▼]                      │
│                                                                  │
│  Path: s3://acme-ml-data/datasets/                              │
│                                                                  │
│  ├── transactions/                                               │
│  │   ├── 2023/                                                   │
│  │   └── 2024/                                                   │
│  │       ├── q1_transactions.parquet    [✓]                     │
│  │       ├── q2_transactions.parquet    [✓]                     │
│  │       └── q3_transactions.parquet    [✓]                     │
│  └── customers/                                                  │
│                                                                  │
│  Selected: 3 files (2.1 GB total)                               │
│                                                                  │
│                                              [Cancel] [Next →]  │
└─────────────────────────────────────────────────────────────────┘

Step 3: Configure Import

Set import options:

Option	Description	Default
Name	Dataset identifier	Filename
Description	What this data represents	—
Type	Data modality	Auto-detected
Tags	Organization labels	—
Auto-profile	Run profiling after import	Enabled
Sampling	Import subset for large files	Disabled

Configuration Form:

┌─────────────────────────────────────────────────────────────────┐
│  Configure Dataset                                               │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Name:        [customer_transactions_2024        ]              │
│                                                                  │
│  Description: [Transaction data for Q1-Q3 2024   ]              │
│               [including fraud labels            ]              │
│                                                                  │
│  Type:        [Tabular                          ▼]              │
│                                                                  │
│  Tags:        [transactions] [fraud] [2024] [+]                 │
│                                                                  │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ Options                                                  │   │
│  ├─────────────────────────────────────────────────────────┤   │
│  │ [✓] Auto-profile after import                           │   │
│  │ [ ] Sample data (for large files)                       │   │
│  │     Sample size: [10000] rows                           │   │
│  │ [✓] Infer data types                                    │   │
│  │ [ ] First row is header                                 │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                  │
│                                            [← Back] [Import]    │
└─────────────────────────────────────────────────────────────────┘

Step 4: Review and Import

Click Import to start the process. You'll see:

Upload Progress — File transfer status
Schema Detection — Automatic column type inference
Profiling — Statistical analysis (if enabled)

Viewing Dataset Details

Click on any dataset to see its details:

┌─────────────────────────────────────────────────────────────────────────────┐
│  customer_transactions_2024                                    [⚙ Actions ▼]│
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Overview │ Schema │ Profile │ Versions │ Lineage │ Access                  │
│  ─────────────────────────────────────────────────────────────────────────  │
│                                                                              │
│  Status: Ready                    Created: Jan 15, 2024                     │
│  Type: Tabular                    Updated: Jan 15, 2024                     │
│  Rows: 100,000                    Size: 45.2 MB                             │
│  Columns: 12                      Version: v1.0                             │
│                                                                              │
│  Description:                                                                │
│  Transaction data for Q1-Q3 2024 including fraud labels for                 │
│  training fraud detection models.                                            │
│                                                                              │
│  Tags: [transactions] [fraud] [2024] [production]                           │
│                                                                              │
│  Quick Actions:                                                              │
│  ┌────────────┐  ┌────────────┐  ┌────────────┐  ┌────────────┐           │
│  │  Generate  │  │   Apply    │  │   Export   │  │   Clone    │           │
│  │  Synthetic │  │   Recipe   │  │            │  │            │           │
│  └────────────┘  └────────────┘  └────────────┘  └────────────┘           │
└─────────────────────────────────────────────────────────────────────────────┘

Dataset Schema Tab

View and edit column definitions:

┌─────────────────────────────────────────────────────────────────────────────┐
│  Schema                                                        [Edit Schema]│
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  │ Column              │ Type     │ Nullable │ Unique │ Sample Values      ││
│  ├─────────────────────┼──────────┼──────────┼────────┼────────────────────┤│
│  │ transaction_id      │ string   │ No       │ Yes    │ TXN-2024-00001    ││
│  │ customer_id         │ string   │ No       │ No     │ CUST-10042        ││
│  │ timestamp           │ datetime │ No       │ No     │ 2024-01-15 14:32  ││
│  │ amount              │ float    │ No       │ No     │ 127.50, 45.99     ││
│  │ currency            │ string   │ No       │ No     │ USD, EUR, GBP     ││
│  │ merchant_id         │ string   │ No       │ No     │ MERCH-5521        ││
│  │ merchant_category   │ string   │ No       │ No     │ retail, food      ││
│  │ card_type           │ string   │ Yes      │ No     │ credit, debit     ││
│  │ is_international    │ boolean  │ No       │ No     │ true, false       ││
│  │ is_fraud            │ boolean  │ No       │ No     │ true, false       ││
│  │ fraud_type          │ string   │ Yes      │ No     │ card_theft, null  ││
│  │ risk_score          │ float    │ Yes      │ No     │ 0.15, 0.89        ││
│  └─────────────────────┴──────────┴──────────┴────────┴────────────────────┘│
└─────────────────────────────────────────────────────────────────────────────┘

Dataset Actions

Action	Description
Generate Synthetic	Create synthetic version
Apply Recipe	Transform with a recipe
Export	Download or save to cloud
Clone	Create a copy
Archive	Move to archive
Delete	Remove permanently

Data Profiles

Understanding Data Profiles

A Data Profile captures the statistical "fingerprint" of your data. Profiles are automatically created during import but can also be manually defined.

Viewing Profile Results

After profiling completes:

┌─────────────────────────────────────────────────────────────────────────────┐
│  Profile: customer_transactions_2024                                         │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Summary │ Columns │ Correlations │ Anomalies │ Quality Score               │
│  ─────────────────────────────────────────────────────────────────────────  │
│                                                                              │
│  Overall Quality Score: 87/100 ████████▓░                                   │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │ Metric                      │ Value          │ Status              │   │
│  ├─────────────────────────────┼────────────────┼─────────────────────┤   │
│  │ Total Rows                  │ 100,000        │ ✓                   │   │
│  │ Total Columns               │ 12             │ ✓                   │   │
│  │ Missing Values              │ 2.3%           │ ⚠ Moderate          │   │
│  │ Duplicate Rows              │ 0.1%           │ ✓ Low               │   │
│  │ Outliers Detected           │ 1.2%           │ ✓ Low               │   │
│  │ Type Consistency            │ 99.8%          │ ✓ High              │   │
│  └─────────────────────────────┴────────────────┴─────────────────────┘   │
│                                                                              │
│  Class Distribution (is_fraud):                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │ False (99%)  ████████████████████████████████████████████████      │   │
│  │ True (1%)    █                                                       │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│  ⚠ Warning: Highly imbalanced classes detected                              │
└─────────────────────────────────────────────────────────────────────────────┘

Column-Level Statistics

Each column has detailed statistics:

┌─────────────────────────────────────────────────────────────────────────────┐
│  Column: amount                                                              │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Type: float                                                                 │
│  Missing: 0 (0.0%)                                                          │
│  Unique: 45,231 (45.2%)                                                     │
│                                                                              │
│  ┌───────────────────────┐    Statistics:                                   │
│  │      Distribution     │    ─────────────────                              │
│  │                       │    Min:     0.01                                 │
│  │    ▂▅▇█▇▅▃▂▁         │    Max:     15,420.50                            │
│  │                       │    Mean:    127.43                               │
│  │  0    500   1000+     │    Median:  78.25                                │
│  └───────────────────────┘    Std Dev:  245.67                              │
│                               Skewness: 2.34 (right-skewed)                 │
│                                                                              │
│  Distribution: Log-normal (best fit)                                         │
│                                                                              │
│  Outliers: 1,234 values > 3 std deviations                                  │
└─────────────────────────────────────────────────────────────────────────────┘

Creating Custom Profiles

For synthetic data generation, you may want to define profiles manually:

Go to Synthex → Profiles
Click Create Profile
Define columns and constraints:

# Example profile definition
name: custom_transactions
description: Custom profile for transaction generation

columns:
  - name: customer_id
    type: string
    generator: uuid

  - name: amount
    type: float
    constraints:
      min: 0.01
      max: 10000
    distribution:
      type: log_normal
      mean: 100
      std: 200

  - name: is_fraud
    type: boolean
    distribution:
      true: 0.02  # 2% fraud rate
      false: 0.98

  - name: timestamp
    type: datetime
    constraints:
      min: "2024-01-01"
      max: "2024-12-31"
    distribution:
      type: uniform

correlations:
  - columns: [amount, is_fraud]
    type: positive
    strength: 0.3  # Higher amounts slightly more likely to be fraud

Data Recipes

What are Recipes?

Data Recipes are reusable transformation pipelines that you can apply to datasets. They ensure:

Reproducibility — Same transformations every time
Consistency — Apply to multiple datasets
Auditability — Track what was done to data

Creating a Recipe

Go to Synthex → Recipes
Click Create Recipe
Add transformation steps:

┌─────────────────────────────────────────────────────────────────────────────┐
│  Create Recipe                                                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Name:        [fraud_data_preparation        ]                              │
│  Description: [Prepare transaction data for fraud detection training]       │
│                                                                              │
│  Steps:                                                        [+ Add Step] │
│  ─────────────────────────────────────────────────────────────────────────  │
│                                                                              │
│  1. ┌─────────────────────────────────────────────────────────────────┐    │
│     │ Clean: Drop Null Values                                [✎] [×] │    │
│     │ Columns: customer_id, amount, timestamp                        │    │
│     │ Action: Drop rows with null values in specified columns        │    │
│     └─────────────────────────────────────────────────────────────────┘    │
│                               ↓                                             │
│  2. ┌─────────────────────────────────────────────────────────────────┐    │
│     │ Filter: Remove Invalid Transactions                    [✎] [×] │    │
│     │ Condition: amount > 0 AND amount < 100000                      │    │
│     │ Action: Keep only rows matching condition                       │    │
│     └─────────────────────────────────────────────────────────────────┘    │
│                               ↓                                             │
│  3. ┌─────────────────────────────────────────────────────────────────┐    │
│     │ Transform: Normalize Amount                            [✎] [×] │    │
│     │ Column: amount                                                  │    │
│     │ Method: Log transformation                                      │    │
│     └─────────────────────────────────────────────────────────────────┘    │
│                               ↓                                             │
│  4. ┌─────────────────────────────────────────────────────────────────┐    │
│     │ Encode: One-Hot Encoding                               [✎] [×] │    │
│     │ Column: merchant_category                                       │    │
│     │ Output: merchant_category_retail, merchant_category_food, ...   │    │
│     └─────────────────────────────────────────────────────────────────┘    │
│                                                                              │
│                                              [Cancel] [Save Recipe]         │
└─────────────────────────────────────────────────────────────────────────────┘

Recipe Step Types

Step Type	Description	Use Case
Clean	Handle missing/invalid data	Data quality
Filter	Remove rows by condition	Data selection
Transform	Modify column values	Feature engineering
Encode	Convert categorical data	ML preparation
Aggregate	Group and summarize	Feature creation
Join	Combine with other datasets	Data enrichment
Sample	Random subset	Testing, balancing
Split	Divide into subsets	Train/test split

Applying a Recipe

To apply a recipe to a dataset:

Go to the dataset detail page
Click Apply Recipe
Select the recipe
Choose output options:

┌─────────────────────────────────────────────────────────────────────────────┐
│  Apply Recipe                                                                │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Dataset: customer_transactions_2024                                         │
│  Recipe:  [fraud_data_preparation                 ▼]                        │
│                                                                              │
│  Output Options:                                                             │
│  ──────────────                                                             │
│  [●] Create new version (recommended)                                       │
│      Version name: [v1.1-cleaned                  ]                         │
│                                                                              │
│  [ ] Create new dataset                                                     │
│      Dataset name: [                              ]                         │
│                                                                              │
│  [ ] Replace current version (destructive)                                  │
│                                                                              │
│  Preview Changes:                                                            │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │ Step 1: Drop Null Values                                            │   │
│  │   → Rows affected: 2,312 (will be removed)                          │   │
│  │                                                                      │   │
│  │ Step 2: Filter Invalid Transactions                                 │   │
│  │   → Rows affected: 45 (will be removed)                             │   │
│  │                                                                      │   │
│  │ Step 3: Normalize Amount                                            │   │
│  │   → Column 'amount' will be log-transformed                         │   │
│  │                                                                      │   │
│  │ Step 4: One-Hot Encoding                                            │   │
│  │   → 8 new columns will be created                                   │   │
│  │                                                                      │   │
│  │ Final: 97,643 rows, 19 columns                                      │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                              │
│                                              [Cancel] [Apply Recipe]        │
└─────────────────────────────────────────────────────────────────────────────┘

Synthetic Data Generation

Generation Methods

Synthex supports multiple synthetic data generation methods:

Method	Description	Best For	Speed
Statistical	Preserves distributions and correlations	Tabular data, quick generation	Fast
GAN	Generative Adversarial Networks	Complex patterns, high fidelity	Slow
VAE	Variational Autoencoders	Balanced quality/speed	Medium
CTGAN	Conditional Tabular GAN	Mixed-type tabular data	Medium
CopulaGAN	Copula-based GAN	Preserving correlations	Medium
LLM	Large Language Models	Text data, complex semantics	Slow
TimeGAN	Temporal GAN	Time series data	Slow
Diffusion	Diffusion models	High-quality images	Very Slow

Starting a Generation Job

Go to Synthex → Datasets
Select your source dataset
Click Generate Synthetic
Configure generation:

┌─────────────────────────────────────────────────────────────────────────────┐
│  Generate Synthetic Data                                                     │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Source: customer_transactions_2024 (v1.0)                                  │
│                                                                              │
│  Generation Method                                                           │
│  ──────────────────                                                         │
│  [●] Statistical (Copula)     - Fast, good for tabular data                │
│  [ ] CTGAN                    - Better for mixed types, slower              │
│  [ ] CopulaGAN               - Best correlation preservation                │
│  [ ] GaussianCopula          - Fastest, simple distributions                │
│                                                                              │
│  Configuration                                                               │
│  ──────────────                                                             │
│  Number of records:    [100000        ]                                     │
│                                                                              │
│  Privacy Settings:                                                           │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │ [✓] Enable differential privacy                                     │   │
│  │     Epsilon (ε): [1.0        ] (lower = more private)              │   │
│  │                                                                      │   │
│  │ [✓] Anonymize identifiers                                           │   │
│  │     Columns: customer_id, merchant_id                               │   │
│  │                                                                      │   │
│  │ [ ] Add noise to numerical columns                                  │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                              │
│  Advanced Options:                                                           │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │ Batch size:        [1000       ]                                    │   │
│  │ Random seed:       [42         ] (for reproducibility)             │   │
│  │ Constraint handling: [Reject invalid ▼]                            │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                              │
│  Output                                                                      │
│  ──────                                                                     │
│  [●] Create new synthetic dataset                                           │
│      Name: [customer_transactions_2024_synthetic  ]                         │
│                                                                              │
│  [ ] Augment existing dataset (combine with original)                       │
│      Augmentation ratio: [50%         ]                                     │
│                                                                              │
│                                         [Cancel] [Start Generation]         │
└─────────────────────────────────────────────────────────────────────────────┘

Monitoring Generation Jobs

Track job progress in the Jobs view:

┌─────────────────────────────────────────────────────────────────────────────┐
│  Job: fraud_synthetic_generation                              [Cancel Job]  │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Status: Running                                                             │
│  Progress: ████████████████████░░░░░░░░░░ 65%                               │
│                                                                              │
│  Started: Jan 15, 2024 14:32:15                                             │
│  Elapsed: 12 minutes                                                         │
│  ETA: ~6 minutes remaining                                                   │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │ Stage                        │ Status    │ Progress  │ Duration    │   │
│  ├──────────────────────────────┼───────────┼───────────┼─────────────┤   │
│  │ 1. Load source data          │ Complete  │ 100%      │ 0:45        │   │
│  │ 2. Fit generator model       │ Complete  │ 100%      │ 8:23        │   │
│  │ 3. Generate synthetic rows   │ Running   │ 65%       │ 3:12        │   │
│  │ 4. Validate output           │ Pending   │ —         │ —           │   │
│  │ 5. Save results              │ Pending   │ —         │ —           │   │
│  └──────────────────────────────┴───────────┴───────────┴─────────────┘   │
│                                                                              │
│  Logs:                                                                       │
│  ───────────────────────────────────────────────────────────────────────── │
│  [14:32:15] Starting generation job...                                       │
│  [14:33:00] Source data loaded: 100,000 rows                                │
│  [14:33:02] Fitting CTGAN model...                                          │
│  [14:41:25] Model training complete                                          │
│  [14:41:26] Generating synthetic samples: batch 1/100                       │
│  [14:44:38] Progress: 65,000/100,000 samples generated                      │
└─────────────────────────────────────────────────────────────────────────────┘

Evaluating Synthetic Data Quality

After generation, Synthex automatically evaluates quality:

┌─────────────────────────────────────────────────────────────────────────────┐
│  Quality Report: customer_transactions_2024_synthetic                        │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Overall Quality Score: 92/100 █████████▓                                   │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │ Metric                           │ Score  │ Status                  │   │
│  ├──────────────────────────────────┼────────┼─────────────────────────┤   │
│  │ Statistical Similarity           │ 94%    │ ✓ Excellent             │   │
│  │ Distribution Match (KL Div)      │ 0.02   │ ✓ Very Low              │   │
│  │ Correlation Preservation         │ 91%    │ ✓ Good                  │   │
│  │ Constraint Satisfaction          │ 100%   │ ✓ Perfect               │   │
│  │ Privacy Score (k-anonymity)      │ k=15   │ ✓ Strong                │   │
│  │ ML Efficacy                      │ 89%    │ ✓ Good                  │   │
│  └──────────────────────────────────┴────────┴─────────────────────────┘   │
│                                                                              │
│  Column-by-Column Comparison:                                                │
│  ──────────────────────────────────────────────────────────────────────────│
│                                                                              │
│  amount:                                                                     │
│  ┌────────────────────────┐  ┌────────────────────────┐                    │
│  │ Original Distribution  │  │ Synthetic Distribution │                    │
│  │     ▂▅▇█▇▅▃▂▁         │  │     ▂▄▇█▇▅▃▂▁         │                    │
│  │  0    500   1000+      │  │  0    500   1000+      │                    │
│  └────────────────────────┘  └────────────────────────┘                    │
│  Similarity: 96% ████████████████████░                                      │
│                                                                              │
│  is_fraud:                                                                   │
│  Original:  True: 1.02%  False: 98.98%                                      │
│  Synthetic: True: 1.01%  False: 98.99%                                      │
│  Similarity: 99% ████████████████████                                       │
│                                                                              │
│                                               [Export Report] [Download PDF] │
└─────────────────────────────────────────────────────────────────────────────┘

Generator Configurations

Saving Generator Configs

For repeatable generation, save your configurations:

# Example saved configuration
name: fraud_augmentation_config
description: Generate fraud cases for training data augmentation

source:
  type: dataset
  name: customer_transactions_2024
  version: v1.0
  filter: "is_fraud = true"  # Learn only from fraud cases

method: ctgan
parameters:
  epochs: 300
  batch_size: 500
  discriminator_steps: 1
  log_frequency: true

privacy:
  differential_privacy:
    enabled: true
    epsilon: 1.0
  anonymize_columns:
    - customer_id
    - merchant_id

output:
  records: 50000
  format: parquet
  destination: s3://ml-data/synthetic/

Managing Configurations

View and manage saved configs:

┌─────────────────────────────────────────────────────────────────────────────┐
│  Generator Configurations                                    [+ New Config] │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  │ Name                      │ Method  │ Source Dataset     │ Last Used    ││
│  ├───────────────────────────┼─────────┼────────────────────┼──────────────┤│
│  │ fraud_augmentation        │ CTGAN   │ customer_trans...  │ 2 days ago   ││
│  │ review_generation         │ GPT-4   │ product_reviews    │ 1 week ago   ││
│  │ sensor_timeseries         │ TimeGAN │ sensor_readings    │ 3 days ago   ││
│  │ privacy_safe_customers    │ Copula  │ customer_data      │ Today        ││
│  └───────────────────────────┴─────────┴────────────────────┴──────────────┘│
│                                                                              │
│  Selected: fraud_augmentation                                                │
│  [Run Now] [Edit] [Clone] [Delete] [View History]                           │
└─────────────────────────────────────────────────────────────────────────────┘

Integration with Model Training

The Synthex-Training Connection

Synthex and Model Training work together seamlessly:

┌─────────────────────────────────────────────────────────────────────────────┐
│                        Data-to-Model Workflow                                │
└─────────────────────────────────────────────────────────────────────────────┘

                    Synthex                              Model Training
    ┌────────────────────────────────────┐    ┌────────────────────────────────┐
    │                                    │    │                                │
    │  ┌──────────┐                      │    │      ┌──────────────────┐     │
    │  │ Datasets │──────────────────────│───>│──────│ Dataset Selector │     │
    │  └──────────┘                      │    │      └────────┬─────────┘     │
    │       │                            │    │               │               │
    │       ▼                            │    │               ▼               │
    │  ┌──────────┐                      │    │      ┌──────────────────┐     │
    │  │ Profiles │                      │    │      │   Experiment     │     │
    │  └──────────┘                      │    │      │   Configuration  │     │
    │       │                            │    │      └────────┬─────────┘     │
    │       ▼                            │    │               │               │
    │  ┌──────────┐      ┌──────────┐   │    │               ▼               │
    │  │ Recipes  │─────>│ Versions │───│───>│      ┌──────────────────┐     │
    │  └──────────┘      └──────────┘   │    │      │  Training Run    │     │
    │                                    │    │      └──────────────────┘     │
    │  ┌──────────┐                      │    │                                │
    │  │Synthetic │──────────────────────│───>│                                │
    │  └──────────┘                      │    │                                │
    │                                    │    │                                │
    └────────────────────────────────────┘    └────────────────────────────────┘

Selecting Data in Training Wizard

When creating a training experiment:

┌─────────────────────────────────────────────────────────────────────────────┐
│  Training Wizard - Step 2: Select Dataset                                    │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Select data source for training:                                            │
│                                                                              │
│  Data Source: [From Synthex               ▼]                                │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │ Available Datasets                                   [Search...]    │   │
│  ├─────────────────────────────────────────────────────────────────────┤   │
│  │                                                                      │   │
│  │ □ customer_transactions_2024                                         │   │
│  │   │── v1.0 (original)           100,000 rows                        │   │
│  │   │── v1.1 (cleaned)            97,643 rows                         │   │
│  │   └── v2.0 (augmented)          147,643 rows   ← Recommended        │   │
│  │                                                                      │   │
│  │ ☑ customer_transactions_2024_synthetic                              │   │
│  │   └── v1.0 (generated)          100,000 rows                        │   │
│  │                                                                      │   │
│  │ □ product_reviews                                                    │   │
│  │   └── v1.0 (original)           50,000 rows                         │   │
│  │                                                                      │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                              │
│  Apply Recipe: [fraud_data_preparation    ▼] (optional)                     │
│                                                                              │
│  Data Split:                                                                 │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │ Training:   [70%  ] ████████████████████████████░░░░░░░░░░░░░░░░   │   │
│  │ Validation: [15%  ] ░░░░░░░░░░░░░░░░░░░░░░░░░░░░█████░░░░░░░░░░░   │   │
│  │ Test:       [15%  ] ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░█████░░░░░░   │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                              │
│  Stratify by: [is_fraud               ▼] (maintain class distribution)     │
│                                                                              │
│                                               [← Back] [Next: Configure →]  │
└─────────────────────────────────────────────────────────────────────────────┘

Lineage Tracking

Model Training records exactly which data was used:

┌─────────────────────────────────────────────────────────────────────────────┐
│  Experiment: fraud-detector-v3                                               │
│  Run: run-2024-01-15-001                                                     │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Data Lineage:                                                               │
│  ─────────────                                                              │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                                                                      │   │
│  │  ┌───────────────────┐     ┌───────────────────┐                    │   │
│  │  │ customer_trans... │────>│ fraud_data_prep   │                    │   │
│  │  │ v1.0 (original)   │     │ (recipe applied)  │                    │   │
│  │  └───────────────────┘     └─────────┬─────────┘                    │   │
│  │                                      │                               │   │
│  │  ┌───────────────────┐               │                               │   │
│  │  │ synthetic_fraud   │               │                               │   │
│  │  │ v1.0 (generated)  │───────────────┤                               │   │
│  │  └───────────────────┘               │                               │   │
│  │                                      ▼                               │   │
│  │                            ┌───────────────────┐                     │   │
│  │                            │ Training Dataset  │                     │   │
│  │                            │ 147,643 rows      │                     │   │
│  │                            │ 70% train/15% val │                     │   │
│  │                            └─────────┬─────────┘                     │   │
│  │                                      │                               │   │
│  │                                      ▼                               │   │
│  │                            ┌───────────────────┐                     │   │
│  │                            │ fraud-detector-v3 │                     │   │
│  │                            │ (trained model)   │                     │   │
│  │                            └───────────────────┘                     │   │
│  │                                                                      │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                              │
│  [View Full Lineage] [Export Lineage Report]                                │
└─────────────────────────────────────────────────────────────────────────────┘

Example Scenarios

Scenario 1: Augmenting Imbalanced Fraud Data

Problem: You have transaction data with only 1% fraud cases, leading to poor model performance on the minority class.

Solution: Generate synthetic fraud cases to balance the dataset.

Step 1: Analyze the Imbalance

Go to Synthex → Datasets
Open customer_transactions
View the Profile tab

Class Distribution (is_fraud):
├── False: 99,000 (99%)
└── True:   1,000 (1%)

⚠ Warning: Severe class imbalance detected
   Recommendation: Consider synthetic augmentation

Step 2: Create Fraud-Only Generator

Click Generate Synthetic
Configure:

- Filter source: is_fraud = true (learn only from fraud patterns)

- Method: CTGAN

- Records: 49,000 (to achieve ~50% fraud after augmentation)

# Generation config
source_filter: "is_fraud = true"
method: ctgan
parameters:
  epochs: 500
  batch_size: 100

output:
  records: 49000
  mode: augment  # Combine with original

Step 3: Run Generation and Validate

After generation completes:

Augmented Dataset Summary:
├── Total rows: 149,000
├── Original (real): 100,000
├── Synthetic fraud: 49,000
└── Class distribution:
    ├── False: 99,000 (66.4%)
    └── True:  50,000 (33.6%)

Quality Score: 91/100
Fraud pattern preservation: 94%

Step 4: Use in Training

Go to Model Training → New Experiment
Select the augmented dataset
Enable stratified sampling
Train your model

Result: Model recall on fraud cases improves from 45% to 82%.

Scenario 2: Creating Privacy-Safe Test Data

Problem: Your QA team needs realistic test data but cannot access production data due to privacy regulations.

Solution: Generate synthetic data that preserves statistical properties without containing real customer information.

Step 1: Import and Profile Production Data

Work with your data team to import a sample of production data into Synthex. Enable automatic profiling.

Step 2: Configure Privacy-Safe Generation

Select the dataset
Click Generate Synthetic
Enable privacy features:

# Privacy-focused configuration
method: gaussian_copula
parameters:
  default_distribution: parametric

privacy:
  differential_privacy:
    enabled: true
    epsilon: 0.5  # Strong privacy guarantee

  anonymize_columns:
    - customer_id
    - email
    - phone
    - address

  pii_handling:
    names: synthetic  # Generate fake names
    dates: shift      # Randomly shift dates
    amounts: noise    # Add controlled noise

output:
  records: 10000
  format: csv

Step 3: Validate Privacy

Review the privacy report:

Privacy Assessment:
├── k-anonymity: k=50 (Strong)
├── l-diversity: l=10 (Strong)
├── Re-identification risk: <0.1% (Minimal)
└── PII detection: None found

All identifiers successfully anonymized.
No real customer data exposed.

Step 4: Export for QA

Click Export
Choose format (CSV, JSON)
Download or send to cloud storage

Result: QA team has realistic test data without privacy concerns.

Scenario 3: Generating Training Data from Schema

Problem: You're building a new feature but have no historical data yet. You need realistic data to develop and test your model.

Solution: Define a data profile from scratch and generate synthetic data matching your expected schema.

Step 1: Create a Custom Profile

Go to Synthex → Profiles
Click Create Profile
Define your expected schema:

name: subscription_churn_profile
description: Expected data for subscription churn prediction

columns:
  - name: user_id
    type: string
    generator: uuid

  - name: signup_date
    type: datetime
    constraints:
      min: "2022-01-01"
      max: "2024-01-01"
    distribution:
      type: uniform

  - name: subscription_tier
    type: category
    values: [free, basic, premium, enterprise]
    distribution:
      free: 0.40
      basic: 0.35
      premium: 0.20
      enterprise: 0.05

  - name: monthly_usage_hours
    type: float
    constraints:
      min: 0
      max: 200
    distribution:
      type: beta
      alpha: 2
      beta: 5
      scale: 200

  - name: support_tickets
    type: integer
    constraints:
      min: 0
      max: 50
    distribution:
      type: poisson
      lambda: 2

  - name: churned
    type: boolean
    distribution:
      true: 0.15
      false: 0.85

correlations:
  - columns: [monthly_usage_hours, churned]
    type: negative
    strength: 0.4  # Less usage → more likely to churn

  - columns: [support_tickets, churned]
    type: positive
    strength: 0.3  # More tickets → more likely to churn

Step 2: Generate from Profile

Go to Synthex → Generate Data
Select From Profile
Choose your custom profile
Set record count (e.g., 50,000)

Step 3: Validate Generated Data

Review that distributions match expectations:

Generated Dataset Validation:
├── subscription_tier distribution: ✓ Matches profile
├── churned rate: 14.8% (expected 15%) ✓
├── usage-churn correlation: -0.38 (expected -0.4) ✓
└── All constraints satisfied: ✓

Step 4: Iterate and Refine

As you develop, refine your profile based on insights:

# Updated profile based on domain feedback
columns:
  - name: monthly_usage_hours
    type: float
    # Refined: usage differs by tier
    conditional_distribution:
      on: subscription_tier
      distributions:
        free:
          type: exponential
          scale: 10
        premium:
          type: normal
          mean: 80
          std: 30

Result: Realistic development data aligned with business expectations.

Scenario 4: Text Data Generation for NLP

Problem: You need training data for a customer service classifier but have limited labeled examples.

Solution: Use LLM-based generation to create diverse training examples.

Step 1: Prepare Seed Examples

Upload a small dataset of labeled examples:

Category: billing_question
Examples:
- "Why was I charged twice this month?"
- "Can you explain this fee on my invoice?"
- "I need a refund for the overcharge"

Category: technical_support
Examples:
- "The app keeps crashing when I open it"
- "I can't log in to my account"
- "The feature isn't working as expected"

Category: general_inquiry
Examples:
- "What are your business hours?"
- "Do you ship internationally?"
- "How do I contact support?"

Step 2: Configure LLM Generation

Select the seed dataset
Choose LLM method
Configure:

method: llm
parameters:
  model: gpt-4
  temperature: 0.8  # Some creativity

  prompt_template: |
    Generate a diverse customer service message for category: {category}

    Examples of this category:
    {examples}

    Generate a new, unique message that fits this category.
    Be diverse in tone, length, and specific issues.

  preserve_label: true
  variations_per_example: 10

constraints:
  min_length: 20
  max_length: 200
  language: english

Step 3: Review and Filter

Generated examples are scored for quality:

Generated Examples (billing_question):

[Score: 0.95] "I noticed an unexpected $15 charge on my
              statement dated March 3rd - could you help
              me understand what this is for?"

[Score: 0.91] "hey there, pretty confused about my bill.
              shows i paid but also have a balance due??"

[Score: 0.87] "Looking at my invoice #INV-2024-001, the
              subtotal doesn't seem to match the itemized
              charges. Please advise."

[Score: 0.45] "Payment question" [Filtered - too short]

Step 4: Export for Training

Export the quality-filtered dataset for NLP model training.

Result: Expanded training set from 100 to 1,000+ labeled examples.

Best Practices

Data Quality

Practice	Description
Profile before generating	Always understand your source data's statistics
Validate outputs	Check that generated data matches expected distributions
Preserve correlations	Use methods that maintain relationships between columns
Test with real models	Validate synthetic data improves actual model performance

Privacy & Security

Practice	Description
Enable differential privacy	For sensitive data, always use DP guarantees
Anonymize identifiers	Never generate data that could identify real individuals
Audit before sharing	Review generated data before distributing
Document data lineage	Track which real data influenced synthetic outputs

Performance

Practice	Description
Start with statistical methods	Fastest and often sufficient
Use GANs for complex patterns	When statistical methods aren't capturing nuances
Batch large generations	Split very large generation jobs
Cache generator models	Reuse trained generators for multiple outputs

Reproducibility

Practice	Description
Set random seeds	Enable exact reproduction of results
Version your configs	Save and version generator configurations
Document generation params	Record all parameters used
Link to experiments	Track which generated data trained which models

Limitations & Future Directions

Current Limitations

Limitation	Description	Workaround
Complex dependencies	Very intricate column relationships may not be fully captured	Use domain-specific post-processing
Rare events	Extremely rare patterns (< 0.1%) are hard to learn	Oversample rare cases in source data
Sequential consistency	Time series generation may have temporal artifacts	Use TimeGAN or domain validation
Image quality	High-resolution images require significant compute	Start with lower resolution, upscale
Multi-table	Cross-table relationships require manual configuration	Use relational synthesis features

Planned Features

Federated synthesis — Generate from distributed data without centralizing
Active learning integration — Prioritize generation of samples that help models most
Real-time generation — Stream synthetic data on-demand
AutoML for generation — Automatically select best method and parameters
Multi-modal synthesis — Generate coherent text + tabular + image combinations

Method Selection Guide

┌─────────────────────────────────────────────────────────────────────────────┐
│                    Choosing a Generation Method                              │
└─────────────────────────────────────────────────────────────────────────────┘

                         What type of data?
                              │
            ┌─────────────────┼─────────────────┐
            ▼                 ▼                 ▼
        Tabular            Text           Time Series
            │                 │                 │
      ┌─────┴─────┐          │          ┌──────┴──────┐
      ▼           ▼          ▼          ▼             ▼
  Simple?    Complex?     Short/     Long/         Regular?
      │           │       Labels?    Context?          │
      ▼           ▼          │          │             ▼
  Statistical  CTGAN/      ▼          ▼          TimeGAN
  (Copula)    CopulaGAN   Template   LLM/GPT

API Reference

Datasets API

Base URL: /api/v1/datasets

GET    /                    List all datasets
POST   /                    Create/import dataset
GET    /{id}                Get dataset details
PUT    /{id}                Update dataset
DELETE /{id}                Delete dataset
GET    /{id}/versions       List versions
GET    /{id}/profile        Get profile results
POST   /{id}/profile        Trigger profiling
GET    /{id}/sample         Get data sample
POST   /{id}/export         Export dataset

Profiles API

Base URL: /api/v1/profiles

GET    /                    List all profiles
POST   /                    Create profile
GET    /{id}                Get profile details
PUT    /{id}                Update profile
DELETE /{id}                Delete profile
POST   /{id}/validate       Validate data against profile

Recipes API

Base URL: /api/v1/recipes

GET    /                    List all recipes
POST   /                    Create recipe
GET    /{id}                Get recipe details
PUT    /{id}                Update recipe
DELETE /{id}                Delete recipe
POST   /{id}/apply          Apply recipe to dataset
POST   /{id}/preview        Preview recipe results

Generators API

Base URL: /api/v1/generators

GET    /                    List generator configs
POST   /                    Create generator config
GET    /{id}                Get config details
PUT    /{id}                Update config
DELETE /{id}                Delete config
POST   /{id}/run            Start generation job

Jobs API

Base URL: /api/v1/jobs

GET    /                    List all jobs
GET    /{id}                Get job details
POST   /{id}/cancel         Cancel running job
GET    /{id}/logs           Get job logs
GET    /{id}/results        Get job results

Conclusion

Synthex is a powerful tool for managing your ML data lifecycle. Whether you're:

Augmenting imbalanced datasets to improve model performance
Generating privacy-safe data for development and testing
Creating training data from scratch when real data doesn't exist
Preparing data with reproducible recipes for consistent pipelines

...Synthex provides the capabilities you need.

For questions or feedback, consult your Inwire administrator or visit the User Guide.

Synthex User Guide

Table of Contents

Introduction to Synthex

What is Synthex?

Why Synthetic Data?

Synthex in the ML Lifecycle

Core Concepts

Datasets

Versions

Data Profiles

Data Recipes

Generator Configurations

Modalities

Getting Started

Accessing Synthex

Navigation Overview

Working with Datasets

Importing a Dataset

Step 1: Start Import

Step 2: Choose Source

Step 3: Configure Import

Step 4: Review and Import

Viewing Dataset Details

Dataset Schema Tab

Dataset Actions

Data Profiles

Understanding Data Profiles

Viewing Profile Results

Column-Level Statistics

Creating Custom Profiles

Data Recipes

What are Recipes?

Creating a Recipe

Recipe Step Types

Applying a Recipe

Synthetic Data Generation

Generation Methods

Starting a Generation Job

Monitoring Generation Jobs

Evaluating Synthetic Data Quality

Generator Configurations

Saving Generator Configs

Managing Configurations

Integration with Model Training

The Synthex-Training Connection

Selecting Data in Training Wizard

Lineage Tracking

Example Scenarios

Scenario 1: Augmenting Imbalanced Fraud Data

Step 1: Analyze the Imbalance

Step 2: Create Fraud-Only Generator

Step 3: Run Generation and Validate

Step 4: Use in Training

Scenario 2: Creating Privacy-Safe Test Data

Step 1: Import and Profile Production Data

Step 2: Configure Privacy-Safe Generation

Step 3: Validate Privacy

Step 4: Export for QA

Scenario 3: Generating Training Data from Schema

Step 1: Create a Custom Profile

Step 2: Generate from Profile

Step 3: Validate Generated Data

Step 4: Iterate and Refine

Scenario 4: Text Data Generation for NLP

Step 1: Prepare Seed Examples

Step 2: Configure LLM Generation

Step 3: Review and Filter

Step 4: Export for Training

Best Practices

Data Quality

Privacy & Security

Performance

Reproducibility

Limitations & Future Directions

Current Limitations

Planned Features

Method Selection Guide

API Reference

Datasets API

Profiles API