ModelOps User Guide

ModelOps is Inwire's production model lifecycle workspace. Use it to register model artifacts, review versions, build serving images, deploy models to Kubernetes, Docker, or hosted inference providers, and operate deployments with monitoring, security, governance, and InferenceIQ optimization.

This guide is written for ML engineers, platform engineers, and MLOps operators who use the Inwire web app. API paths are included where they help explain what a workflow does behind the scenes.

What ModelOps Manages
Before You Start
Navigate ModelOps
Register a Model
Review Models and Versions
Build Serving Images
Deploy a Model
Operate Deployments
Security and Governance
InferenceIQ Optimization
Common Workflows
Troubleshooting
Quick Reference

What ModelOps Manages

ModelOps manages the part of the ML lifecycle that starts when a model is ready to become a governed production artifact.

Area	What You Do
Model registry	Register models, manage versions, metadata, visibility, approvals, aliases, and lineage.
Builds	Generate Dockerfiles, build serving images, push images to registries, and review build logs.
Deployments	Deploy models to Kubernetes clusters, Docker VM targets, or hosted inference providers.
Manifests and GitOps	Generate manifests, store versions, open pull requests, and sync through ArgoCD.
Monitoring	Track deployment health, latency, throughput, errors, GPU usage, traces, cache behavior, and SLO status.
Governance	Run evaluations, policy checks, security scans, approvals, promotion requests, and pre-deployment gates.
InferenceIQ	Profile hardware, run optimization experiments, validate configurations, and apply safe serving improvements.

ModelOps is multi-tenant. Most resources are scoped to your organization and, where configured, your team. What you can see or change depends on your role and your organization's policies.

Before You Start

Make sure these prerequisites are in place before using ModelOps for a production deployment:

Requirement	Why It Matters
Inwire account and organization access	ModelOps actions are authenticated and scoped to your org/team.
Model artifact source	Registration requires a Hugging Face model ID, Git repository, cloud object path, upload, or existing Docker image.
Serving target	Deployment requires a Kubernetes cluster, Docker VM target, or hosted inference provider.
Container registry or managed image flow	Non-Docker-source models usually need a built serving image.
Secrets and integrations	GitHub, ArgoCD, cloud, storage, registry, or hosted provider credentials must be connected before use.
Promotion policy awareness	Production deployments may require evaluations, security scans, approvals, or gate overrides.

Default local development URLs:

Surface	URL
Frontend	`http://localhost:3000/modelops`
ModelOps API	`http://localhost:18087`
ModelOps API docs	`http://localhost:18087/docs` when API docs are enabled
Prometheus metrics	`http://localhost:18087/metrics` when metrics are enabled

Navigate ModelOps

Open ModelOps from the Inwire sidebar. The landing page highlights primary actions and status cards.

Page	Path	Use It For
Register Model	`/modelops/register-model`	Add a model from Hugging Face, Git, cloud storage, upload, or Docker image.
Model Registry	`/modelops/registry`	Browse models, inspect versions, compare metadata, and start deploy flows.
Deployments	`/modelops/deployments`	View active deployments, create deployments, open deployment details, and manage lifecycle actions.
Monitoring	`/modelops/monitoring`	Review platform and model serving metrics.
Security	`/modelops/security`	Review vulnerability and security posture.
Policies	`/modelops/policies`	Manage policy definitions.
Gates	`/modelops/gates`	Configure and review pre-deployment gates.
Governance	`/modelops/governance`	Review approvals, compliance checks, and lineage.
InferenceIQ	`/modelops/inferenceiq`	Run performance optimization and validation workflows.
Deployment Logs	`/modelops/deployment-logs`	Investigate deployment history and logs.

Register a Model

Use ModelOps -> Register Model to create a registry entry. The current registration wizard has four steps: Source, Identity, Runtime, and Review.

Step 1: Source

Choose where the model comes from.

Source	Use When	Typical Input
Hugging Face	The model is published or mirrored in Hugging Face format.	`meta-llama/Llama-3.1-8B-Instruct`
GitHub / GitLab	The model package and serving code live in a repository.	Repository URL
Cloud Storage	Weights are stored in S3, GCS, or Azure Blob.	Bucket/object URI plus integration details
Upload	You have a smaller local artifact suitable for direct upload.	File selection
Docker Image	You already have a serving image.	Image reference such as `registry/org/model:v1`

After you enter a source, ModelOps can analyze it through POST /api/v1/models/analyze. The analysis may detect framework, architecture, model type, size class, storage strategy, recommended engine, required dependencies, and whether custom code or trust_remote_code is involved.

Review warnings carefully. A source can be syntactically valid but still risky for production if it requires remote code, has incomplete metadata, or lacks a compatible serving engine.

Step 2: Identity

Set the model identity:

Field	Guidance
Name	Stable registry name. Use lowercase, descriptive names for automation-friendly references.
Display name	Human-readable label for dashboards and reviews.
Description	Explain purpose, expected inputs, owner, and intended environment.
Visibility	Choose Private, Team, Organization, or Public according to data and model policy.
Tags	Add framework, domain, owner, risk, stage, or workload tags.

Step 3: Runtime

Select the interface and serving engine.

Interface	Use For
Chat	OpenAI-compatible chat completions.
Completion	Text completion or generation APIs.
Embedding	Vector embedding models.
Custom	Non-standard interfaces or custom handlers.

Common serving engines:

Engine	Best Fit
vLLM	High-throughput LLM serving.
Text Generation Inference (TGI)	Hugging Face text generation workloads.
Text Embeddings Inference (TEI)	Embedding workloads.
Triton Inference Server	Multi-framework GPU inference.
ONNX Runtime	Cross-platform optimized inference.

Docker-image sources can skip runtime engine selection because the image already defines runtime behavior.

Step 4: Review

Confirm source, identity, runtime, visibility, tags, and any source-analysis warnings. Submitting creates a model registry record through POST /api/v1/models/register-v2 or the active model creation endpoint configured for the environment.

After registration, use the registry detail page to inspect status, versions, metadata, build readiness, evaluations, and deployment history.

Review Models and Versions

Use ModelOps -> Model Registry to find and manage registered models.

Common registry tasks:

Task	Where
Search or filter models	Registry list
Open model details	Registry row or card
Compare models	ModelOps -> Model Comparison
Add a model version	Model detail -> Create Version
Review evaluation history	Model detail -> Evaluation history
Manage approval state	Model detail or governance workflows
Start a deployment	Model detail -> Deploy

Model versions may carry approval state, aliases, build references, evaluation results, lineage, and deployment history. Use aliases for stable references such as staging, production, or latest-approved instead of relying on changing version IDs in operational runbooks.

Build Serving Images

Many registered models need a serving image before deployment. Use the build pages when ModelOps needs to package the runtime, dependencies, model loader, and inference server.

Page	Use It For
`/modelops/builds/trigger`	Start a model image build.
`/modelops/builds/{buildId}`	Watch status, logs, retries, push state, and failures.
`/modelops/register-model/build-status`	Review registration-related build status.

Build capabilities include:

Dockerfile preview with POST /api/v1/builds/preview-dockerfile.
Build trigger with POST /api/v1/builds/trigger.
Registry listing with GET /api/v1/builds/registries.
Build status and logs through GET /api/v1/builds/{build_id} and GET /api/v1/builds/{build_id}/logs.
Retry, push, retry-push, and cancel actions for failed or interrupted builds.

Before deploying, confirm the build is successful and either linked to the model version or selected explicitly in the deployment wizard.

Deploy a Model

Use ModelOps -> Deployments -> Create. You can also start from a model detail page with a preselected model.

The deployment wizard adapts to the target runtime:

Runtime	Steps	Best For
Kubernetes	Target, Model, Resources, Networking, Health & Reliability, Compliance, GitOps, Review	Production services on managed or private clusters.
Docker	Target, Model, Configuration, Compose Preview, Review	Fast iteration on Docker VM targets.
Hosted	Target, Model, Provider Config, Review	Managed inference providers where cluster operations are externalized.

Step 1: Target

Choose a deployment target.

Target Type	Options
Cloud Kubernetes	AWS, GCP, Azure, Alibaba Cloud, Oracle Cloud, Nebius Kubernetes where configured.
On-premises / private	Kubernetes, OpenShift, K3s, Minikube, or Docker VM targets connected through Inwire agents.
Hosted inference	SageMaker, Vertex AI, Azure ML, Nebius AI Studio, Alibaba PAI-EAS, Together AI, Fireworks, Hugging Face, Replicate, Baseten, RunPod.

For Kubernetes targets, select the connected cluster and namespace. For Docker targets, select one or more VM targets. For hosted targets, selecting the provider sets the runtime to hosted.

Step 2: Model

Select the model, model version, and optionally a build image, template, or source experiment. Prefer approved model versions for staging and production. If a required build is missing, return to the build flow before continuing.

Kubernetes Step 3: Resources

Configure serving resources:

Field	Guidance
GPU type and count	Match model size, engine requirements, and cluster availability.
CPU and memory	Reserve enough headroom for tokenization, model server overhead, and sidecars.
Replicas	Start with minimum and maximum bounds that match the traffic plan.
Inference engine	Choose vLLM, TGI, Triton, TEI, ONNX, or another catalog entry.
Autoscaling targets	Configure CPU, memory, and optional custom metric thresholds.
InferenceIQ config	Apply a validated InferenceIQ configuration when available.

The wizard can use cluster capability checks and resource recommendations. Treat infeasible recommendations as blockers unless you intentionally override with an approved capacity plan.

Kubernetes Step 4: Networking

Configure exposure, service, ingress, TLS, auth, rate limiting, network policies, access logs, and metrics.

Exposure	Typical Result
Private	Cluster-internal access, usually `ClusterIP`.
Team	Team-scoped ingress or internal access controls.
Organization	Org-scoped ingress or access policy.
Public	External endpoint, usually with stricter TLS, auth, rate limits, and audit expectations.

Authentication options include none, API key, JWT, or mTLS. Public endpoints should normally use authentication, TLS, access logging, metrics, and rate limiting.

Kubernetes Step 5: Health & Reliability

Configure liveness/readiness probes, deployment strategy, priority class, runtime class, pre-stop hooks, tenancy mode, weights loading mode, storage class, and GPU node pool.

Deployment strategies:

Strategy	Use When
Rolling update	Standard low-risk updates where gradual replacement is acceptable.
Canary	You want partial traffic before full rollout.
Blue/Green	You need a clean switch between old and new versions.

Weights loading modes:

Mode	Use When
Init container	Weights are downloaded before the serving container starts.
Persistent volume claim	Weights should persist across pod restarts.
Node-local cache	Repeated deployments should reuse local node cache.
Streaming loader	Runtime supports streaming or lazy loading.

Kubernetes Step 6: Compliance

Select compliance frameworks and enforcement details when the deployment handles regulated or sensitive data.

Supported framework labels include GDPR, HIPAA, SOC 2, PCI DSS, and ISO 27001. Configure data residency, encryption at rest, encryption in transit, audit retention, access-control policy, and sensitive-data handling.

If a deployment gate fails but business approval exists, use a documented gate override. Overrides should include a clear justification and, where possible, an expiration.

Kubernetes Step 7: GitOps

Enable GitOps when deployments should flow through repository review and ArgoCD sync instead of direct deploy.

Configure:

GitHub integration.
Repository and branch.
Manifest path.
ArgoCD instance and application name.
Whether ModelOps should create a pull request.

Use GitOps for production and shared environments when your organization requires reviewable infrastructure changes.

Docker Step: Configuration and Compose Preview

For Docker targets, configure image/runtime values, resources, environment variables, health checks, and generated Docker Compose output. Use the Compose Preview step to inspect the generated service definition before submitting.

Docker deployments are useful for quick experiments, internal demos, and optimization loops. They are not a substitute for production Kubernetes controls unless your organization has explicitly approved Docker VM production targets.

Hosted Step: Provider Config

Hosted inference targets collect provider-specific settings such as instance type, accelerator, region, endpoint type, autoscaling bounds, service account or IAM role, scaling profile, or hardware tier.

Review provider-specific cost, security, and data-handling obligations before deployment. ModelOps tracks the deployment, but the provider owns parts of the runtime behavior.

Review and Submit

The review step summarizes the deployment payload and highlights target, model, resource, networking, compliance, GitOps, advanced, and hosted settings. Submit creates the deployment through POST /api/v1/deployments-wizard and redirects to the deployment detail page.

Operate Deployments

Use ModelOps -> Deployments to monitor and manage live deployments.

Common actions:

Action	Use When
Open detail page	Review status, endpoint, metrics, health, hardware, and prediction information.
Scale deployment	Adjust replica bounds or runtime scaling settings.
Review logs	Investigate rollout, build, runtime, or health-check failures.
Run evaluation	Validate quality against a dataset before promotion.
Request promotion	Move a model/deployment toward staging or production with approval.
Rollback or redeploy	Recover from a failed release or unsafe configuration.
Delete deployment	Remove an unused endpoint after confirming no consumers depend on it.

Monitoring surfaces include:

Deployment health and status.
Latency, throughput, and error rates.
GPU, CPU, memory, cache, and queue metrics.
Traces and request diagnostics.
SLO compliance and optimization actions.
Logs and event history.

For API users, common endpoint groups include /api/v1/deployments, /api/v1/monitoring, /api/v1/inference/performance, /api/v1/inference/batch, and /api/v1/inference/queue.

Security and Governance

ModelOps governance is built around policy, evaluation, approval, and gate checks.

Capability	Purpose
Security scans	Run scan templates for supply chain, dependency, license, PII, jailbreak, prompt injection, adversarial, or OWASP-style categories.
Policy checks	Validate model or deployment changes against organization policies.
Pre-deployment gates	Enforce minimum eval score, vulnerability limits, policy pass status, approval requirements, or recent InferenceIQ verdicts.
Promotion requests	Request approval to move a model or deployment into staging or production.
Approval workflows	Track who approved, rejected, or applied a governed action.
Lineage	See where models came from and where they are deployed.
Audit logs	Preserve operational and compliance history.

Gate rule types include:

Rule Type	Meaning
`min_eval_score`	Require a minimum evaluation score before deployment or promotion.
`max_critical_vulns`	Block when critical vulnerability count exceeds policy.
`policy_checks_pass`	Require policy checks to pass.
`approval_required`	Require human approval.
`inferenceiq_recent_verdict`	Require a recent acceptable InferenceIQ verdict.

Production guidance:

Keep production models on approved versions.
Require evaluation and security evidence before promotion.
Use GitOps for reviewable manifest changes.
Avoid permanent gate overrides.
Restrict public endpoints with TLS, auth, rate limits, logging, and monitoring.

InferenceIQ Optimization

InferenceIQ is the ModelOps optimization workspace for serving performance, cost, and reliability. It can be used before deployment, during tuning, or after production monitoring shows bottlenecks.

Main InferenceIQ Areas

Area	Path	Use It For
Dashboard	`/modelops/inferenceiq`	See stats, active experiments, tools, and recent plans.
Planner	`/modelops/inferenceiq/planner`	Generate AI-assisted optimization plans.
Experiments	`/modelops/inferenceiq/experiments`	Run, inspect, favorite, and compare optimization experiments.
Runs	`/modelops/inferenceiq/runs/{runId}`	Review optimization run details.
Benchmarking	`/modelops/inferenceiq/benchmark`	Validate latency, throughput, and quality.
Registry	`/modelops/inferenceiq/registry`	Review validated configurations and reusable optimization knowledge.
Monitoring	`/modelops/inferenceiq/monitor`	Watch production inference metrics and feedback signals.
Governance settings	`/modelops/inferenceiq/settings/governance`	Configure optimization governance.

Optimization Tools

InferenceIQ organizes core optimization capabilities as L0-L8 tools:

Level	Tool	Typical Use
L0	Hardware Profiling	Match model, GPU, memory, and deployment context.
L1	Kernel Tuning	Tune FlashAttention, PagedAttention, fused ops, and engine kernels.
L2	Quantization	Reduce precision to improve memory use, speed, or cost.
L3	Parallelism	Choose tensor, pipeline, or data parallelism strategies.
L4	Speculative Decoding	Use draft models to reduce generation latency.
L5	Batching/Scheduling	Tune continuous batching, scheduling, and KV-cache behavior.
L6	Sparsity	Apply sparse compute or weight strategies.
L7	Pruning	Remove redundant model structure.
L8	Distillation	Transfer behavior into a smaller model.

InferenceIQ can order these tools differently for throughput, latency, or cost objectives. Some tools can apply configuration directly to deployment settings; others require validation before production use.

Recommended loop:

Start with the model and deployment goal: throughput, latency, or cost.
Run L0 hardware profiling or use existing telemetry.
Run one or more optimization experiments.
Benchmark candidates against your quality floor.
Save validated configurations to the registry.
Apply a validated configuration in the deployment wizard or deployment detail flow.
Monitor production behavior and feed results back into future plans.

Common Workflows

Register and Deploy a Hugging Face LLM

Open ModelOps -> Register Model.
Choose Hugging Face and enter the model ID.
Review source analysis warnings, size, framework, and recommended engine.
Set name, display name, visibility, and tags.
Choose Chat or Completion and a serving engine such as vLLM or TGI.
Submit the registration.
Trigger a serving image build if one is required.
Open Deployments -> Create.
Select Kubernetes, Docker, or hosted target.
Select the model version and successful build.
Configure resources, networking, health, compliance, and GitOps.
Review and deploy.
Watch the deployment detail page until status is healthy.

Promote a Model to Production

Confirm the model version is approved or request approval.
Run required evaluation suites and security scans.
Check gate readiness for the production environment.
Resolve failed rules or create a justified override.
Submit a promotion request with evaluation evidence.
Deploy through GitOps if required.
Monitor SLO compliance after rollout.

Tune an Existing Deployment

Open InferenceIQ.
Choose the relevant objective and tool sequence.
Run experiments or open Planner for recommendations.
Benchmark the best candidate.
Save or validate the resulting config.
Apply the config to a deployment or use it in the deployment wizard.
Compare production telemetry before and after the change.

Investigate a Failing Deployment

Open the deployment detail page.
Check high-level status, recent events, and health probes.
Open deployment logs.
Review build status if the image failed to pull or start.
Check resource feasibility and GPU availability.
Inspect networking, auth, TLS, and rate-limit configuration for endpoint failures.
Review monitoring metrics for latency, errors, saturation, or crash loops.
Roll back, scale, or redeploy after identifying the cause.

Troubleshooting

Symptom	Likely Cause	What To Check
Model source analysis fails	Bad URI, missing integration, unsupported source, private repository, or invalid credentials.	Source URI, integration credentials, access permissions, and source-specific warnings.
Registration succeeds but build is missing	Docker image source was used, build was not triggered, or build failed.	Build Status and `/modelops/builds/{buildId}` logs.
Build fails	Dependency conflict, missing base image, registry auth failure, large model packaging issue.	Dockerfile preview, build logs, registry settings, dependency list, and model size strategy.
Cannot proceed in deployment wizard	Required field missing for the active runtime.	Target selection, model ID, resource GPU type/count, inference engine, VM/cluster/provider selection.
Kubernetes target is infeasible	Cluster lacks GPU, memory, node pool, or topology required by the model.	Resource recommendation banner, cluster capabilities, node pool cards, and GPU telemetry.
Endpoint is unreachable	Ingress, service type, TLS, auth, DNS, or network policy mismatch.	Networking step, ingress host/path, auth method, generated policies, and access logs.
Deployment is unhealthy	Probe path/port mismatch, slow model load, insufficient resources, image pull failure, bad environment variables.	Health step, logs, events, readiness/liveness settings, image pull secrets, and weights mode.
Gate blocks deployment	Required eval, scan, policy, approval, or InferenceIQ verdict is missing or failed.	Gates page, latest gate evaluation, policy check requests, scan results, and approvals.
InferenceIQ cannot apply a config	Config not validated, incompatible engine, missing model/deployment target, or governance policy blocks apply.	InferenceIQ registry status, model engine, deployment resources, and governance settings.

Quick Reference

Key Frontend Paths

Task	Path
ModelOps home	`/modelops`
Register model	`/modelops/register-model`
Registry	`/modelops/registry`
Create deployment	`/modelops/deployments/create`
Deployment list	`/modelops/deployments`
Monitoring	`/modelops/monitoring`
Security scans	`/modelops/security-scans`
Policies	`/modelops/policies`
Gates	`/modelops/gates`
InferenceIQ	`/modelops/inferenceiq`

Key API Groups

All paths below are under the ModelOps API prefix /api/v1.

API Group	Purpose
`/models`	Model registry, source analysis, versions, approvals, aliases, deletion preview, lineage helpers.
`/builds`	Dockerfile preview, builds, logs, retries, registry push operations.
`/deployments`	Deployment records and lifecycle operations.
`/deployments-wizard`	Wizard-driven deployment creation and manifest generation.
`/deployment-templates`	Deployment template discovery and management.
`/gates`	Gate rules, readiness, evaluations, overrides, default seeding.
`/security-scan-templates` and `/security-scan-requests`	Security scan templates and scan execution.
`/policy-check-requests`	Policy check execution and status.
`/promotion-requests`	Promotion workflow requests.
`/evaluation-requests`	Evaluation orchestration over model or deployment targets.
`/inferenceiq/*`	InferenceIQ dashboard, planner, experiments, tools, registry, benchmark, monitoring, apply, and feedback APIs.

Operational Defaults

Setting	Default
Deployment environment from wizard	`production` unless changed by a specific flow.
Kubernetes namespace	`default` until changed.
Service port / target port	`8000`.
Liveness probe	Enabled, `/health`, initial delay 30s, period 10s.
Readiness probe	Enabled, `/health`, initial delay 15s, period 5s.
Audit retention in compliance step	90 days.
Access-control policy in compliance step	RBAC.
GitOps branch	`main`.

Backend Services Overview
InferenceIQ Experiment Engine
InferenceIQ L1 Kernel Tuning
InferenceIQ L4 Speculative Decoding
Archived engineering analysis: analysis-docs/modelops/