Model Training User Guide
Run experiments, training jobs, hyperparameter tuning, dataset selection, approvals, recipes, and training observability.
Who This Guide Is For
- ML engineers
- Data scientists
- Training platform operators
Where To Go
| Page |
Use It For |
/model-training |
Training dashboard. |
/model-training/new |
Create a training job. |
/model-training/experiments |
Track experiments and runs. |
/model-training/jobs |
Monitor submitted jobs. |
/model-training/datasets |
Select and review training datasets. |
/model-training/hpt |
Run hyperparameter tuning. |
/model-training/templates |
Reusable training templates. |
/model-training/recipes |
Reusable training recipes. |
/model-training/approvals |
Approve promotion of trained outputs. |
/model-training/observability |
Review training metrics and system behavior. |
Core Concepts
| Concept |
Meaning |
| Experiment |
A named workspace for related training attempts. |
| Run |
One execution of a training configuration. |
| Job |
The scheduled compute workload behind a run. |
| Artifact |
A model, checkpoint, metric file, log, or generated asset from training. |
| Template |
A reusable job configuration for repeated training patterns. |
Common Workflows
Start a training experiment
- Open Model Training and create an experiment.
- Select dataset and version.
- Choose a template or configure training manually.
- Set compute resources and environment variables.
- Submit the job.
- Monitor metrics, logs, and artifacts.
- Register the best output into ModelOps.
Run hyperparameter tuning
- Choose an existing experiment.
- Open HPT.
- Define the search space and objective metric.
- Select a strategy such as random, grid, or Bayesian search.
- Launch tuning.
- Compare candidate runs and promote the winner.
Best Practices
- Record dataset version, code version, parameters, metrics, and artifacts for every run.
- Use templates for repeatable training patterns.
- Keep production candidate approvals separate from exploratory runs.
- Register only validated model artifacts into ModelOps.