InferenceIQ User Guide

Optimize inference cost, latency, throughput, hardware fit, serving engine settings, and production feedback loops.

Who This Guide Is For

Where To Go

Page Use It For
/modelops/inferenceiq InferenceIQ dashboard.
/modelops/inferenceiq/planner AI-assisted optimization planner.
/modelops/inferenceiq/experiments Experiment lifecycle and history.
/modelops/inferenceiq/hardware L0 hardware profiling.
/modelops/inferenceiq/kernels L1 kernel tuning.
/modelops/inferenceiq/quantization L2 quantization.
/modelops/inferenceiq/parallelism L3 parallelism.
/modelops/inferenceiq/speculation L4 speculative decoding.
/modelops/inferenceiq/batching L5 batching and scheduling.
/modelops/inferenceiq/benchmark Benchmark and validation runs.
/modelops/inferenceiq/registry Validated configuration registry.
/modelops/inferenceiq/monitor Production inference feedback.

Core Concepts

Concept Meaning
Objective The optimization target: throughput, latency, or cost.
Experiment A measured optimization run over a model, engine, and hardware context.
Validated configuration A tested set of serving parameters that can be applied to deployments.
Tool level Optimization areas L0-L8, from hardware profiling through distillation.
Apply flow The controlled process for moving a validated config into deployment settings.

Common Workflows

Run an optimization loop

  1. Choose objective and model/deployment context.
  2. Run hardware profiling if capacity is unknown.
  3. Run recommended L0-L8 experiments.
  4. Benchmark candidates against quality and SLO targets.
  5. Save validated configuration.
  6. Apply the configuration to a deployment or new deployment wizard.
  7. Monitor production feedback.

Use the planner

  1. Open Planner.
  2. Select model, hardware, workload, and target objective.
  3. Review recommended sequence and confidence.
  4. Run or save the plan.
  5. Open plan detail to track actions and evidence.

Best Practices