ModelOps User Guide

ModelOps is Inwire's production model lifecycle workspace. Use it to register model artifacts, review versions, build serving images, deploy models to Kubernetes, Docker, or hosted inference providers, and operate deployments with monitoring, security, governance, and InferenceIQ optimization.

This guide is written for ML engineers, platform engineers, and MLOps operators who use the Inwire web app. API paths are included where they help explain what a workflow does behind the scenes.


Table of Contents

  1. What ModelOps Manages
  2. Before You Start
  3. Navigate ModelOps
  4. Register a Model
  5. Review Models and Versions
  6. Build Serving Images
  7. Deploy a Model
  8. Operate Deployments
  9. Security and Governance
  10. InferenceIQ Optimization
  11. Common Workflows
  12. Troubleshooting
  13. Quick Reference

What ModelOps Manages

ModelOps manages the part of the ML lifecycle that starts when a model is ready to become a governed production artifact.

Area What You Do
Model registry Register models, manage versions, metadata, visibility, approvals, aliases, and lineage.
Builds Generate Dockerfiles, build serving images, push images to registries, and review build logs.
Deployments Deploy models to Kubernetes clusters, Docker VM targets, or hosted inference providers.
Manifests and GitOps Generate manifests, store versions, open pull requests, and sync through ArgoCD.
Monitoring Track deployment health, latency, throughput, errors, GPU usage, traces, cache behavior, and SLO status.
Governance Run evaluations, policy checks, security scans, approvals, promotion requests, and pre-deployment gates.
InferenceIQ Profile hardware, run optimization experiments, validate configurations, and apply safe serving improvements.

ModelOps is multi-tenant. Most resources are scoped to your organization and, where configured, your team. What you can see or change depends on your role and your organization's policies.


Before You Start

Make sure these prerequisites are in place before using ModelOps for a production deployment:

Requirement Why It Matters
Inwire account and organization access ModelOps actions are authenticated and scoped to your org/team.
Model artifact source Registration requires a Hugging Face model ID, Git repository, cloud object path, upload, or existing Docker image.
Serving target Deployment requires a Kubernetes cluster, Docker VM target, or hosted inference provider.
Container registry or managed image flow Non-Docker-source models usually need a built serving image.
Secrets and integrations GitHub, ArgoCD, cloud, storage, registry, or hosted provider credentials must be connected before use.
Promotion policy awareness Production deployments may require evaluations, security scans, approvals, or gate overrides.

Default local development URLs:

Surface URL
Frontend http://localhost:3000/modelops
ModelOps API http://localhost:18087
ModelOps API docs http://localhost:18087/docs when API docs are enabled
Prometheus metrics http://localhost:18087/metrics when metrics are enabled

Open ModelOps from the Inwire sidebar. The landing page highlights primary actions and status cards.

Page Path Use It For
Register Model /modelops/register-model Add a model from Hugging Face, Git, cloud storage, upload, or Docker image.
Model Registry /modelops/registry Browse models, inspect versions, compare metadata, and start deploy flows.
Deployments /modelops/deployments View active deployments, create deployments, open deployment details, and manage lifecycle actions.
Monitoring /modelops/monitoring Review platform and model serving metrics.
Security /modelops/security Review vulnerability and security posture.
Policies /modelops/policies Manage policy definitions.
Gates /modelops/gates Configure and review pre-deployment gates.
Governance /modelops/governance Review approvals, compliance checks, and lineage.
InferenceIQ /modelops/inferenceiq Run performance optimization and validation workflows.
Deployment Logs /modelops/deployment-logs Investigate deployment history and logs.

Register a Model

Use ModelOps -> Register Model to create a registry entry. The current registration wizard has four steps: Source, Identity, Runtime, and Review.

Step 1: Source

Choose where the model comes from.

Source Use When Typical Input
Hugging Face The model is published or mirrored in Hugging Face format. meta-llama/Llama-3.1-8B-Instruct
GitHub / GitLab The model package and serving code live in a repository. Repository URL
Cloud Storage Weights are stored in S3, GCS, or Azure Blob. Bucket/object URI plus integration details
Upload You have a smaller local artifact suitable for direct upload. File selection
Docker Image You already have a serving image. Image reference such as registry/org/model:v1

After you enter a source, ModelOps can analyze it through POST /api/v1/models/analyze. The analysis may detect framework, architecture, model type, size class, storage strategy, recommended engine, required dependencies, and whether custom code or trust_remote_code is involved.

Review warnings carefully. A source can be syntactically valid but still risky for production if it requires remote code, has incomplete metadata, or lacks a compatible serving engine.

Step 2: Identity

Set the model identity:

Field Guidance
Name Stable registry name. Use lowercase, descriptive names for automation-friendly references.
Display name Human-readable label for dashboards and reviews.
Description Explain purpose, expected inputs, owner, and intended environment.
Visibility Choose Private, Team, Organization, or Public according to data and model policy.
Tags Add framework, domain, owner, risk, stage, or workload tags.

Step 3: Runtime

Select the interface and serving engine.

Interface Use For
Chat OpenAI-compatible chat completions.
Completion Text completion or generation APIs.
Embedding Vector embedding models.
Custom Non-standard interfaces or custom handlers.

Common serving engines:

Engine Best Fit
vLLM High-throughput LLM serving.
Text Generation Inference (TGI) Hugging Face text generation workloads.
Text Embeddings Inference (TEI) Embedding workloads.
Triton Inference Server Multi-framework GPU inference.
ONNX Runtime Cross-platform optimized inference.

Docker-image sources can skip runtime engine selection because the image already defines runtime behavior.

Step 4: Review

Confirm source, identity, runtime, visibility, tags, and any source-analysis warnings. Submitting creates a model registry record through POST /api/v1/models/register-v2 or the active model creation endpoint configured for the environment.

After registration, use the registry detail page to inspect status, versions, metadata, build readiness, evaluations, and deployment history.


Review Models and Versions

Use ModelOps -> Model Registry to find and manage registered models.

Common registry tasks:

Task Where
Search or filter models Registry list
Open model details Registry row or card
Compare models ModelOps -> Model Comparison
Add a model version Model detail -> Create Version
Review evaluation history Model detail -> Evaluation history
Manage approval state Model detail or governance workflows
Start a deployment Model detail -> Deploy

Model versions may carry approval state, aliases, build references, evaluation results, lineage, and deployment history. Use aliases for stable references such as staging, production, or latest-approved instead of relying on changing version IDs in operational runbooks.


Build Serving Images

Many registered models need a serving image before deployment. Use the build pages when ModelOps needs to package the runtime, dependencies, model loader, and inference server.

Page Use It For
/modelops/builds/trigger Start a model image build.
/modelops/builds/{buildId} Watch status, logs, retries, push state, and failures.
/modelops/register-model/build-status Review registration-related build status.

Build capabilities include:

Before deploying, confirm the build is successful and either linked to the model version or selected explicitly in the deployment wizard.


Deploy a Model

Use ModelOps -> Deployments -> Create. You can also start from a model detail page with a preselected model.

The deployment wizard adapts to the target runtime:

Runtime Steps Best For
Kubernetes Target, Model, Resources, Networking, Health & Reliability, Compliance, GitOps, Review Production services on managed or private clusters.
Docker Target, Model, Configuration, Compose Preview, Review Fast iteration on Docker VM targets.
Hosted Target, Model, Provider Config, Review Managed inference providers where cluster operations are externalized.

Step 1: Target

Choose a deployment target.

Target Type Options
Cloud Kubernetes AWS, GCP, Azure, Alibaba Cloud, Oracle Cloud, Nebius Kubernetes where configured.
On-premises / private Kubernetes, OpenShift, K3s, Minikube, or Docker VM targets connected through Inwire agents.
Hosted inference SageMaker, Vertex AI, Azure ML, Nebius AI Studio, Alibaba PAI-EAS, Together AI, Fireworks, Hugging Face, Replicate, Baseten, RunPod.

For Kubernetes targets, select the connected cluster and namespace. For Docker targets, select one or more VM targets. For hosted targets, selecting the provider sets the runtime to hosted.

Step 2: Model

Select the model, model version, and optionally a build image, template, or source experiment. Prefer approved model versions for staging and production. If a required build is missing, return to the build flow before continuing.

Kubernetes Step 3: Resources

Configure serving resources:

Field Guidance
GPU type and count Match model size, engine requirements, and cluster availability.
CPU and memory Reserve enough headroom for tokenization, model server overhead, and sidecars.
Replicas Start with minimum and maximum bounds that match the traffic plan.
Inference engine Choose vLLM, TGI, Triton, TEI, ONNX, or another catalog entry.
Autoscaling targets Configure CPU, memory, and optional custom metric thresholds.
InferenceIQ config Apply a validated InferenceIQ configuration when available.

The wizard can use cluster capability checks and resource recommendations. Treat infeasible recommendations as blockers unless you intentionally override with an approved capacity plan.

Kubernetes Step 4: Networking

Configure exposure, service, ingress, TLS, auth, rate limiting, network policies, access logs, and metrics.

Exposure Typical Result
Private Cluster-internal access, usually ClusterIP.
Team Team-scoped ingress or internal access controls.
Organization Org-scoped ingress or access policy.
Public External endpoint, usually with stricter TLS, auth, rate limits, and audit expectations.

Authentication options include none, API key, JWT, or mTLS. Public endpoints should normally use authentication, TLS, access logging, metrics, and rate limiting.

Kubernetes Step 5: Health & Reliability

Configure liveness/readiness probes, deployment strategy, priority class, runtime class, pre-stop hooks, tenancy mode, weights loading mode, storage class, and GPU node pool.

Deployment strategies:

Strategy Use When
Rolling update Standard low-risk updates where gradual replacement is acceptable.
Canary You want partial traffic before full rollout.
Blue/Green You need a clean switch between old and new versions.

Weights loading modes:

Mode Use When
Init container Weights are downloaded before the serving container starts.
Persistent volume claim Weights should persist across pod restarts.
Node-local cache Repeated deployments should reuse local node cache.
Streaming loader Runtime supports streaming or lazy loading.

Kubernetes Step 6: Compliance

Select compliance frameworks and enforcement details when the deployment handles regulated or sensitive data.

Supported framework labels include GDPR, HIPAA, SOC 2, PCI DSS, and ISO 27001. Configure data residency, encryption at rest, encryption in transit, audit retention, access-control policy, and sensitive-data handling.

If a deployment gate fails but business approval exists, use a documented gate override. Overrides should include a clear justification and, where possible, an expiration.

Kubernetes Step 7: GitOps

Enable GitOps when deployments should flow through repository review and ArgoCD sync instead of direct deploy.

Configure:

Use GitOps for production and shared environments when your organization requires reviewable infrastructure changes.

Docker Step: Configuration and Compose Preview

For Docker targets, configure image/runtime values, resources, environment variables, health checks, and generated Docker Compose output. Use the Compose Preview step to inspect the generated service definition before submitting.

Docker deployments are useful for quick experiments, internal demos, and optimization loops. They are not a substitute for production Kubernetes controls unless your organization has explicitly approved Docker VM production targets.

Hosted Step: Provider Config

Hosted inference targets collect provider-specific settings such as instance type, accelerator, region, endpoint type, autoscaling bounds, service account or IAM role, scaling profile, or hardware tier.

Review provider-specific cost, security, and data-handling obligations before deployment. ModelOps tracks the deployment, but the provider owns parts of the runtime behavior.

Review and Submit

The review step summarizes the deployment payload and highlights target, model, resource, networking, compliance, GitOps, advanced, and hosted settings. Submit creates the deployment through POST /api/v1/deployments-wizard and redirects to the deployment detail page.


Operate Deployments

Use ModelOps -> Deployments to monitor and manage live deployments.

Common actions:

Action Use When
Open detail page Review status, endpoint, metrics, health, hardware, and prediction information.
Scale deployment Adjust replica bounds or runtime scaling settings.
Review logs Investigate rollout, build, runtime, or health-check failures.
Run evaluation Validate quality against a dataset before promotion.
Request promotion Move a model/deployment toward staging or production with approval.
Rollback or redeploy Recover from a failed release or unsafe configuration.
Delete deployment Remove an unused endpoint after confirming no consumers depend on it.

Monitoring surfaces include:

For API users, common endpoint groups include /api/v1/deployments, /api/v1/monitoring, /api/v1/inference/performance, /api/v1/inference/batch, and /api/v1/inference/queue.


Security and Governance

ModelOps governance is built around policy, evaluation, approval, and gate checks.

Capability Purpose
Security scans Run scan templates for supply chain, dependency, license, PII, jailbreak, prompt injection, adversarial, or OWASP-style categories.
Policy checks Validate model or deployment changes against organization policies.
Pre-deployment gates Enforce minimum eval score, vulnerability limits, policy pass status, approval requirements, or recent InferenceIQ verdicts.
Promotion requests Request approval to move a model or deployment into staging or production.
Approval workflows Track who approved, rejected, or applied a governed action.
Lineage See where models came from and where they are deployed.
Audit logs Preserve operational and compliance history.

Gate rule types include:

Rule Type Meaning
min_eval_score Require a minimum evaluation score before deployment or promotion.
max_critical_vulns Block when critical vulnerability count exceeds policy.
policy_checks_pass Require policy checks to pass.
approval_required Require human approval.
inferenceiq_recent_verdict Require a recent acceptable InferenceIQ verdict.

Production guidance:

  1. Keep production models on approved versions.
  2. Require evaluation and security evidence before promotion.
  3. Use GitOps for reviewable manifest changes.
  4. Avoid permanent gate overrides.
  5. Restrict public endpoints with TLS, auth, rate limits, logging, and monitoring.

InferenceIQ Optimization

InferenceIQ is the ModelOps optimization workspace for serving performance, cost, and reliability. It can be used before deployment, during tuning, or after production monitoring shows bottlenecks.

Main InferenceIQ Areas

Area Path Use It For
Dashboard /modelops/inferenceiq See stats, active experiments, tools, and recent plans.
Planner /modelops/inferenceiq/planner Generate AI-assisted optimization plans.
Experiments /modelops/inferenceiq/experiments Run, inspect, favorite, and compare optimization experiments.
Runs /modelops/inferenceiq/runs/{runId} Review optimization run details.
Benchmarking /modelops/inferenceiq/benchmark Validate latency, throughput, and quality.
Registry /modelops/inferenceiq/registry Review validated configurations and reusable optimization knowledge.
Monitoring /modelops/inferenceiq/monitor Watch production inference metrics and feedback signals.
Governance settings /modelops/inferenceiq/settings/governance Configure optimization governance.

Optimization Tools

InferenceIQ organizes core optimization capabilities as L0-L8 tools:

Level Tool Typical Use
L0 Hardware Profiling Match model, GPU, memory, and deployment context.
L1 Kernel Tuning Tune FlashAttention, PagedAttention, fused ops, and engine kernels.
L2 Quantization Reduce precision to improve memory use, speed, or cost.
L3 Parallelism Choose tensor, pipeline, or data parallelism strategies.
L4 Speculative Decoding Use draft models to reduce generation latency.
L5 Batching/Scheduling Tune continuous batching, scheduling, and KV-cache behavior.
L6 Sparsity Apply sparse compute or weight strategies.
L7 Pruning Remove redundant model structure.
L8 Distillation Transfer behavior into a smaller model.

InferenceIQ can order these tools differently for throughput, latency, or cost objectives. Some tools can apply configuration directly to deployment settings; others require validation before production use.

Recommended loop:

  1. Start with the model and deployment goal: throughput, latency, or cost.
  2. Run L0 hardware profiling or use existing telemetry.
  3. Run one or more optimization experiments.
  4. Benchmark candidates against your quality floor.
  5. Save validated configurations to the registry.
  6. Apply a validated configuration in the deployment wizard or deployment detail flow.
  7. Monitor production behavior and feed results back into future plans.

Common Workflows

Register and Deploy a Hugging Face LLM

  1. Open ModelOps -> Register Model.
  2. Choose Hugging Face and enter the model ID.
  3. Review source analysis warnings, size, framework, and recommended engine.
  4. Set name, display name, visibility, and tags.
  5. Choose Chat or Completion and a serving engine such as vLLM or TGI.
  6. Submit the registration.
  7. Trigger a serving image build if one is required.
  8. Open Deployments -> Create.
  9. Select Kubernetes, Docker, or hosted target.
  10. Select the model version and successful build.
  11. Configure resources, networking, health, compliance, and GitOps.
  12. Review and deploy.
  13. Watch the deployment detail page until status is healthy.

Promote a Model to Production

  1. Confirm the model version is approved or request approval.
  2. Run required evaluation suites and security scans.
  3. Check gate readiness for the production environment.
  4. Resolve failed rules or create a justified override.
  5. Submit a promotion request with evaluation evidence.
  6. Deploy through GitOps if required.
  7. Monitor SLO compliance after rollout.

Tune an Existing Deployment

  1. Open InferenceIQ.
  2. Choose the relevant objective and tool sequence.
  3. Run experiments or open Planner for recommendations.
  4. Benchmark the best candidate.
  5. Save or validate the resulting config.
  6. Apply the config to a deployment or use it in the deployment wizard.
  7. Compare production telemetry before and after the change.

Investigate a Failing Deployment

  1. Open the deployment detail page.
  2. Check high-level status, recent events, and health probes.
  3. Open deployment logs.
  4. Review build status if the image failed to pull or start.
  5. Check resource feasibility and GPU availability.
  6. Inspect networking, auth, TLS, and rate-limit configuration for endpoint failures.
  7. Review monitoring metrics for latency, errors, saturation, or crash loops.
  8. Roll back, scale, or redeploy after identifying the cause.

Troubleshooting

Symptom Likely Cause What To Check
Model source analysis fails Bad URI, missing integration, unsupported source, private repository, or invalid credentials. Source URI, integration credentials, access permissions, and source-specific warnings.
Registration succeeds but build is missing Docker image source was used, build was not triggered, or build failed. Build Status and /modelops/builds/{buildId} logs.
Build fails Dependency conflict, missing base image, registry auth failure, large model packaging issue. Dockerfile preview, build logs, registry settings, dependency list, and model size strategy.
Cannot proceed in deployment wizard Required field missing for the active runtime. Target selection, model ID, resource GPU type/count, inference engine, VM/cluster/provider selection.
Kubernetes target is infeasible Cluster lacks GPU, memory, node pool, or topology required by the model. Resource recommendation banner, cluster capabilities, node pool cards, and GPU telemetry.
Endpoint is unreachable Ingress, service type, TLS, auth, DNS, or network policy mismatch. Networking step, ingress host/path, auth method, generated policies, and access logs.
Deployment is unhealthy Probe path/port mismatch, slow model load, insufficient resources, image pull failure, bad environment variables. Health step, logs, events, readiness/liveness settings, image pull secrets, and weights mode.
Gate blocks deployment Required eval, scan, policy, approval, or InferenceIQ verdict is missing or failed. Gates page, latest gate evaluation, policy check requests, scan results, and approvals.
InferenceIQ cannot apply a config Config not validated, incompatible engine, missing model/deployment target, or governance policy blocks apply. InferenceIQ registry status, model engine, deployment resources, and governance settings.

Quick Reference

Key Frontend Paths

Task Path
ModelOps home /modelops
Register model /modelops/register-model
Registry /modelops/registry
Create deployment /modelops/deployments/create
Deployment list /modelops/deployments
Monitoring /modelops/monitoring
Security scans /modelops/security-scans
Policies /modelops/policies
Gates /modelops/gates
InferenceIQ /modelops/inferenceiq

Key API Groups

All paths below are under the ModelOps API prefix /api/v1.

API Group Purpose
/models Model registry, source analysis, versions, approvals, aliases, deletion preview, lineage helpers.
/builds Dockerfile preview, builds, logs, retries, registry push operations.
/deployments Deployment records and lifecycle operations.
/deployments-wizard Wizard-driven deployment creation and manifest generation.
/deployment-templates Deployment template discovery and management.
/gates Gate rules, readiness, evaluations, overrides, default seeding.
/security-scan-templates and /security-scan-requests Security scan templates and scan execution.
/policy-check-requests Policy check execution and status.
/promotion-requests Promotion workflow requests.
/evaluation-requests Evaluation orchestration over model or deployment targets.
/inferenceiq/* InferenceIQ dashboard, planner, experiments, tools, registry, benchmark, monitoring, apply, and feedback APIs.

Operational Defaults

Setting Default
Deployment environment from wizard production unless changed by a specific flow.
Kubernetes namespace default until changed.
Service port / target port 8000.
Liveness probe Enabled, /health, initial delay 30s, period 10s.
Readiness probe Enabled, /health, initial delay 15s, period 5s.
Audit retention in compliance step 90 days.
Access-control policy in compliance step RBAC.
GitOps branch main.