Model Training User Guide

Run experiments, training jobs, hyperparameter tuning, dataset selection, approvals, recipes, and training observability.

Who This Guide Is For

Where To Go

Page Use It For
/model-training Training dashboard.
/model-training/new Create a training job.
/model-training/experiments Track experiments and runs.
/model-training/jobs Monitor submitted jobs.
/model-training/datasets Select and review training datasets.
/model-training/hpt Run hyperparameter tuning.
/model-training/templates Reusable training templates.
/model-training/recipes Reusable training recipes.
/model-training/approvals Approve promotion of trained outputs.
/model-training/observability Review training metrics and system behavior.

Core Concepts

Concept Meaning
Experiment A named workspace for related training attempts.
Run One execution of a training configuration.
Job The scheduled compute workload behind a run.
Artifact A model, checkpoint, metric file, log, or generated asset from training.
Template A reusable job configuration for repeated training patterns.

Common Workflows

Start a training experiment

  1. Open Model Training and create an experiment.
  2. Select dataset and version.
  3. Choose a template or configure training manually.
  4. Set compute resources and environment variables.
  5. Submit the job.
  6. Monitor metrics, logs, and artifacts.
  7. Register the best output into ModelOps.

Run hyperparameter tuning

  1. Choose an existing experiment.
  2. Open HPT.
  3. Define the search space and objective metric.
  4. Select a strategy such as random, grid, or Bayesian search.
  5. Launch tuning.
  6. Compare candidate runs and promote the winner.

Best Practices