Data Studio User Guide
Create projects, manage datasets, design pipelines, label and curate data, and connect data workflows to Synthex, training, and evaluation.
Who This Guide Is For
- Data engineers
- Data scientists
- Labeling teams
- ML engineers
Where To Go
| Page |
Use It For |
/data-studio |
Data Studio dashboard. |
/data-studio/new |
Create a data project. |
/data-studio/datasets |
Manage datasets. |
/data-studio/pipelines |
Build data pipelines. |
/data-studio/[projectId] |
Open a project workspace. |
Core Concepts
| Concept |
Meaning |
| Project |
A workspace for a data objective, dataset group, pipeline, or labeling initiative. |
| Dataset |
A versioned collection of files, records, labels, and metadata. |
| Pipeline |
A repeatable sequence for ingestion, cleaning, validation, enrichment, or export. |
| Labeling workflow |
A review and annotation process for supervised learning or evaluation data. |
| Integration |
A connected data source, storage backend, or downstream consumer. |
Common Workflows
Create a dataset project
- Open Data Studio and create a project.
- Import or connect data.
- Review schema and quality.
- Create a cleaning or transformation pipeline.
- Label or curate records if needed.
- Export to Synthex, Model Training, or Evaluation.
Build a reusable pipeline
- Open Pipelines.
- Choose source and destination.
- Add validation, transformation, and enrichment steps.
- Run a sample.
- Fix errors.
- Schedule or save the pipeline.
Best Practices
- Keep raw, cleaned, labeled, and exported datasets versioned separately.
- Record source ownership and retention requirements.
- Use validation checks before exporting to training or evaluation.
- Document label definitions and reviewer expectations.