Synthex User Guide
Prepare, profile, transform, and generate synthetic datasets for model training, evaluation, and privacy-safe testing.
Who This Guide Is For
- Data scientists
- Data engineers
- ML engineers
- Privacy teams
Where To Go
| Page |
Use It For |
/synthex |
Synthex overview and activity. |
/synthex/datasets |
Imported datasets and dataset status. |
/synthex/dataset-registry |
Reusable dataset catalog. |
/synthex/data-profiles |
Schema, statistics, and quality profile management. |
/synthex/data-recipes |
Reusable data-cleaning and transformation recipes. |
/synthex/data-generator |
Generate synthetic records. |
/synthex/unified-generator |
Generate data with guided method selection. |
/synthex/export-wizard |
Export generated datasets for training or downstream use. |
/synthex/training-feedback |
Review feedback from training and evaluation consumers. |
Core Concepts
| Concept |
Meaning |
| Dataset |
A managed data collection with schema, versions, profile, source, and quality state. |
| Profile |
A statistical and structural description used for validation and generation. |
| Recipe |
A reusable transformation pipeline for cleaning, normalization, filtering, encoding, or augmentation. |
| Generator |
A synthetic data method selected manually or automatically based on use case and modality. |
| Privacy settings |
Differential privacy and sensitive-field controls used to protect source data. |
Common Workflows
Generate privacy-safe tabular data
- Import or select a source dataset.
- Run or review the data profile.
- Apply a cleaning recipe if quality issues are present.
- Open the generator and choose the privacy-preserving use case.
- Set record count and privacy parameters.
- Run generation and review quality metrics.
- Export the dataset to Model Training or object storage.
Prepare edge-case test data
- Open Task Specs or Failure Triggers.
- Define the rare condition or failure mode.
- Generate a targeted dataset.
- Validate profile and label distribution.
- Send the result to Evaluation or Model Training.
Best Practices
- Always review profile drift and quality before exporting generated data.
- Keep original, cleaned, and synthetic versions separate for traceability.
- Use privacy-preserving settings for regulated or customer-derived datasets.
- Tag datasets with owner, modality, use case, and retention requirements.