Model Training User Guide

Run experiments, training jobs, hyperparameter tuning, dataset selection, approvals, recipes, and training observability.

Who This Guide Is For

ML engineers
Data scientists
Training platform operators

Where To Go

Page	Use It For
`/model-training`	Training dashboard.
`/model-training/new`	Create a training job.
`/model-training/experiments`	Track experiments and runs.
`/model-training/jobs`	Monitor submitted jobs.
`/model-training/datasets`	Select and review training datasets.
`/model-training/hpt`	Run hyperparameter tuning.
`/model-training/templates`	Reusable training templates.
`/model-training/recipes`	Reusable training recipes.
`/model-training/approvals`	Approve promotion of trained outputs.
`/model-training/observability`	Review training metrics and system behavior.

Core Concepts

Concept	Meaning
Experiment	A named workspace for related training attempts.
Run	One execution of a training configuration.
Job	The scheduled compute workload behind a run.
Artifact	A model, checkpoint, metric file, log, or generated asset from training.
Template	A reusable job configuration for repeated training patterns.

Common Workflows

Start a training experiment

Open Model Training and create an experiment.
Select dataset and version.
Choose a template or configure training manually.
Set compute resources and environment variables.
Submit the job.
Monitor metrics, logs, and artifacts.
Register the best output into ModelOps.

Run hyperparameter tuning

Choose an existing experiment.
Open HPT.
Define the search space and objective metric.
Select a strategy such as random, grid, or Bayesian search.
Launch tuning.
Compare candidate runs and promote the winner.

Best Practices

Record dataset version, code version, parameters, metrics, and artifacts for every run.
Use templates for repeatable training patterns.
Keep production candidate approvals separate from exploratory runs.
Register only validated model artifacts into ModelOps.