ModelOps User Guide
ModelOps is Inwire's production model lifecycle workspace. Use it to register model artifacts, review versions, build serving images, deploy models to Kubernetes, Docker, or hosted inference providers, and operate deployments with monitoring, security, governance, and InferenceIQ optimization.
This guide is written for ML engineers, platform engineers, and MLOps operators who use the Inwire web app. API paths are included where they help explain what a workflow does behind the scenes.
Table of Contents
- What ModelOps Manages
- Before You Start
- Navigate ModelOps
- Register a Model
- Review Models and Versions
- Build Serving Images
- Deploy a Model
- Operate Deployments
- Security and Governance
- InferenceIQ Optimization
- Common Workflows
- Troubleshooting
- Quick Reference
What ModelOps Manages
ModelOps manages the part of the ML lifecycle that starts when a model is ready to become a governed production artifact.
| Area | What You Do |
|---|---|
| Model registry | Register models, manage versions, metadata, visibility, approvals, aliases, and lineage. |
| Builds | Generate Dockerfiles, build serving images, push images to registries, and review build logs. |
| Deployments | Deploy models to Kubernetes clusters, Docker VM targets, or hosted inference providers. |
| Manifests and GitOps | Generate manifests, store versions, open pull requests, and sync through ArgoCD. |
| Monitoring | Track deployment health, latency, throughput, errors, GPU usage, traces, cache behavior, and SLO status. |
| Governance | Run evaluations, policy checks, security scans, approvals, promotion requests, and pre-deployment gates. |
| InferenceIQ | Profile hardware, run optimization experiments, validate configurations, and apply safe serving improvements. |
ModelOps is multi-tenant. Most resources are scoped to your organization and, where configured, your team. What you can see or change depends on your role and your organization's policies.
Before You Start
Make sure these prerequisites are in place before using ModelOps for a production deployment:
| Requirement | Why It Matters |
|---|---|
| Inwire account and organization access | ModelOps actions are authenticated and scoped to your org/team. |
| Model artifact source | Registration requires a Hugging Face model ID, Git repository, cloud object path, upload, or existing Docker image. |
| Serving target | Deployment requires a Kubernetes cluster, Docker VM target, or hosted inference provider. |
| Container registry or managed image flow | Non-Docker-source models usually need a built serving image. |
| Secrets and integrations | GitHub, ArgoCD, cloud, storage, registry, or hosted provider credentials must be connected before use. |
| Promotion policy awareness | Production deployments may require evaluations, security scans, approvals, or gate overrides. |
Default local development URLs:
| Surface | URL |
|---|---|
| Frontend | http://localhost:3000/modelops |
| ModelOps API | http://localhost:18087 |
| ModelOps API docs | http://localhost:18087/docs when API docs are enabled |
| Prometheus metrics | http://localhost:18087/metrics when metrics are enabled |
Navigate ModelOps
Open ModelOps from the Inwire sidebar. The landing page highlights primary actions and status cards.
| Page | Path | Use It For |
|---|---|---|
| Register Model | /modelops/register-model |
Add a model from Hugging Face, Git, cloud storage, upload, or Docker image. |
| Model Registry | /modelops/registry |
Browse models, inspect versions, compare metadata, and start deploy flows. |
| Deployments | /modelops/deployments |
View active deployments, create deployments, open deployment details, and manage lifecycle actions. |
| Monitoring | /modelops/monitoring |
Review platform and model serving metrics. |
| Security | /modelops/security |
Review vulnerability and security posture. |
| Policies | /modelops/policies |
Manage policy definitions. |
| Gates | /modelops/gates |
Configure and review pre-deployment gates. |
| Governance | /modelops/governance |
Review approvals, compliance checks, and lineage. |
| InferenceIQ | /modelops/inferenceiq |
Run performance optimization and validation workflows. |
| Deployment Logs | /modelops/deployment-logs |
Investigate deployment history and logs. |
Register a Model
Use ModelOps -> Register Model to create a registry entry. The current registration wizard has four steps: Source, Identity, Runtime, and Review.
Step 1: Source
Choose where the model comes from.
| Source | Use When | Typical Input |
|---|---|---|
| Hugging Face | The model is published or mirrored in Hugging Face format. | meta-llama/Llama-3.1-8B-Instruct |
| GitHub / GitLab | The model package and serving code live in a repository. | Repository URL |
| Cloud Storage | Weights are stored in S3, GCS, or Azure Blob. | Bucket/object URI plus integration details |
| Upload | You have a smaller local artifact suitable for direct upload. | File selection |
| Docker Image | You already have a serving image. | Image reference such as registry/org/model:v1 |
After you enter a source, ModelOps can analyze it through POST /api/v1/models/analyze. The analysis may detect framework, architecture, model type, size class, storage strategy, recommended engine, required dependencies, and whether custom code or trust_remote_code is involved.
Review warnings carefully. A source can be syntactically valid but still risky for production if it requires remote code, has incomplete metadata, or lacks a compatible serving engine.
Step 2: Identity
Set the model identity:
| Field | Guidance |
|---|---|
| Name | Stable registry name. Use lowercase, descriptive names for automation-friendly references. |
| Display name | Human-readable label for dashboards and reviews. |
| Description | Explain purpose, expected inputs, owner, and intended environment. |
| Visibility | Choose Private, Team, Organization, or Public according to data and model policy. |
| Tags | Add framework, domain, owner, risk, stage, or workload tags. |
Step 3: Runtime
Select the interface and serving engine.
| Interface | Use For |
|---|---|
| Chat | OpenAI-compatible chat completions. |
| Completion | Text completion or generation APIs. |
| Embedding | Vector embedding models. |
| Custom | Non-standard interfaces or custom handlers. |
Common serving engines:
| Engine | Best Fit |
|---|---|
| vLLM | High-throughput LLM serving. |
| Text Generation Inference (TGI) | Hugging Face text generation workloads. |
| Text Embeddings Inference (TEI) | Embedding workloads. |
| Triton Inference Server | Multi-framework GPU inference. |
| ONNX Runtime | Cross-platform optimized inference. |
Docker-image sources can skip runtime engine selection because the image already defines runtime behavior.
Step 4: Review
Confirm source, identity, runtime, visibility, tags, and any source-analysis warnings. Submitting creates a model registry record through POST /api/v1/models/register-v2 or the active model creation endpoint configured for the environment.
After registration, use the registry detail page to inspect status, versions, metadata, build readiness, evaluations, and deployment history.
Review Models and Versions
Use ModelOps -> Model Registry to find and manage registered models.
Common registry tasks:
| Task | Where |
|---|---|
| Search or filter models | Registry list |
| Open model details | Registry row or card |
| Compare models | ModelOps -> Model Comparison |
| Add a model version | Model detail -> Create Version |
| Review evaluation history | Model detail -> Evaluation history |
| Manage approval state | Model detail or governance workflows |
| Start a deployment | Model detail -> Deploy |
Model versions may carry approval state, aliases, build references, evaluation results, lineage, and deployment history. Use aliases for stable references such as staging, production, or latest-approved instead of relying on changing version IDs in operational runbooks.
Build Serving Images
Many registered models need a serving image before deployment. Use the build pages when ModelOps needs to package the runtime, dependencies, model loader, and inference server.
| Page | Use It For |
|---|---|
/modelops/builds/trigger |
Start a model image build. |
/modelops/builds/{buildId} |
Watch status, logs, retries, push state, and failures. |
/modelops/register-model/build-status |
Review registration-related build status. |
Build capabilities include:
- Dockerfile preview with
POST /api/v1/builds/preview-dockerfile. - Build trigger with
POST /api/v1/builds/trigger. - Registry listing with
GET /api/v1/builds/registries. - Build status and logs through
GET /api/v1/builds/{build_id}andGET /api/v1/builds/{build_id}/logs. - Retry, push, retry-push, and cancel actions for failed or interrupted builds.
Before deploying, confirm the build is successful and either linked to the model version or selected explicitly in the deployment wizard.
Deploy a Model
Use ModelOps -> Deployments -> Create. You can also start from a model detail page with a preselected model.
The deployment wizard adapts to the target runtime:
| Runtime | Steps | Best For |
|---|---|---|
| Kubernetes | Target, Model, Resources, Networking, Health & Reliability, Compliance, GitOps, Review | Production services on managed or private clusters. |
| Docker | Target, Model, Configuration, Compose Preview, Review | Fast iteration on Docker VM targets. |
| Hosted | Target, Model, Provider Config, Review | Managed inference providers where cluster operations are externalized. |
Step 1: Target
Choose a deployment target.
| Target Type | Options |
|---|---|
| Cloud Kubernetes | AWS, GCP, Azure, Alibaba Cloud, Oracle Cloud, Nebius Kubernetes where configured. |
| On-premises / private | Kubernetes, OpenShift, K3s, Minikube, or Docker VM targets connected through Inwire agents. |
| Hosted inference | SageMaker, Vertex AI, Azure ML, Nebius AI Studio, Alibaba PAI-EAS, Together AI, Fireworks, Hugging Face, Replicate, Baseten, RunPod. |
For Kubernetes targets, select the connected cluster and namespace. For Docker targets, select one or more VM targets. For hosted targets, selecting the provider sets the runtime to hosted.
Step 2: Model
Select the model, model version, and optionally a build image, template, or source experiment. Prefer approved model versions for staging and production. If a required build is missing, return to the build flow before continuing.
Kubernetes Step 3: Resources
Configure serving resources:
| Field | Guidance |
|---|---|
| GPU type and count | Match model size, engine requirements, and cluster availability. |
| CPU and memory | Reserve enough headroom for tokenization, model server overhead, and sidecars. |
| Replicas | Start with minimum and maximum bounds that match the traffic plan. |
| Inference engine | Choose vLLM, TGI, Triton, TEI, ONNX, or another catalog entry. |
| Autoscaling targets | Configure CPU, memory, and optional custom metric thresholds. |
| InferenceIQ config | Apply a validated InferenceIQ configuration when available. |
The wizard can use cluster capability checks and resource recommendations. Treat infeasible recommendations as blockers unless you intentionally override with an approved capacity plan.
Kubernetes Step 4: Networking
Configure exposure, service, ingress, TLS, auth, rate limiting, network policies, access logs, and metrics.
| Exposure | Typical Result |
|---|---|
| Private | Cluster-internal access, usually ClusterIP. |
| Team | Team-scoped ingress or internal access controls. |
| Organization | Org-scoped ingress or access policy. |
| Public | External endpoint, usually with stricter TLS, auth, rate limits, and audit expectations. |
Authentication options include none, API key, JWT, or mTLS. Public endpoints should normally use authentication, TLS, access logging, metrics, and rate limiting.
Kubernetes Step 5: Health & Reliability
Configure liveness/readiness probes, deployment strategy, priority class, runtime class, pre-stop hooks, tenancy mode, weights loading mode, storage class, and GPU node pool.
Deployment strategies:
| Strategy | Use When |
|---|---|
| Rolling update | Standard low-risk updates where gradual replacement is acceptable. |
| Canary | You want partial traffic before full rollout. |
| Blue/Green | You need a clean switch between old and new versions. |
Weights loading modes:
| Mode | Use When |
|---|---|
| Init container | Weights are downloaded before the serving container starts. |
| Persistent volume claim | Weights should persist across pod restarts. |
| Node-local cache | Repeated deployments should reuse local node cache. |
| Streaming loader | Runtime supports streaming or lazy loading. |
Kubernetes Step 6: Compliance
Select compliance frameworks and enforcement details when the deployment handles regulated or sensitive data.
Supported framework labels include GDPR, HIPAA, SOC 2, PCI DSS, and ISO 27001. Configure data residency, encryption at rest, encryption in transit, audit retention, access-control policy, and sensitive-data handling.
If a deployment gate fails but business approval exists, use a documented gate override. Overrides should include a clear justification and, where possible, an expiration.
Kubernetes Step 7: GitOps
Enable GitOps when deployments should flow through repository review and ArgoCD sync instead of direct deploy.
Configure:
- GitHub integration.
- Repository and branch.
- Manifest path.
- ArgoCD instance and application name.
- Whether ModelOps should create a pull request.
Use GitOps for production and shared environments when your organization requires reviewable infrastructure changes.
Docker Step: Configuration and Compose Preview
For Docker targets, configure image/runtime values, resources, environment variables, health checks, and generated Docker Compose output. Use the Compose Preview step to inspect the generated service definition before submitting.
Docker deployments are useful for quick experiments, internal demos, and optimization loops. They are not a substitute for production Kubernetes controls unless your organization has explicitly approved Docker VM production targets.
Hosted Step: Provider Config
Hosted inference targets collect provider-specific settings such as instance type, accelerator, region, endpoint type, autoscaling bounds, service account or IAM role, scaling profile, or hardware tier.
Review provider-specific cost, security, and data-handling obligations before deployment. ModelOps tracks the deployment, but the provider owns parts of the runtime behavior.
Review and Submit
The review step summarizes the deployment payload and highlights target, model, resource, networking, compliance, GitOps, advanced, and hosted settings. Submit creates the deployment through POST /api/v1/deployments-wizard and redirects to the deployment detail page.
Operate Deployments
Use ModelOps -> Deployments to monitor and manage live deployments.
Common actions:
| Action | Use When |
|---|---|
| Open detail page | Review status, endpoint, metrics, health, hardware, and prediction information. |
| Scale deployment | Adjust replica bounds or runtime scaling settings. |
| Review logs | Investigate rollout, build, runtime, or health-check failures. |
| Run evaluation | Validate quality against a dataset before promotion. |
| Request promotion | Move a model/deployment toward staging or production with approval. |
| Rollback or redeploy | Recover from a failed release or unsafe configuration. |
| Delete deployment | Remove an unused endpoint after confirming no consumers depend on it. |
Monitoring surfaces include:
- Deployment health and status.
- Latency, throughput, and error rates.
- GPU, CPU, memory, cache, and queue metrics.
- Traces and request diagnostics.
- SLO compliance and optimization actions.
- Logs and event history.
For API users, common endpoint groups include /api/v1/deployments, /api/v1/monitoring, /api/v1/inference/performance, /api/v1/inference/batch, and /api/v1/inference/queue.
Security and Governance
ModelOps governance is built around policy, evaluation, approval, and gate checks.
| Capability | Purpose |
|---|---|
| Security scans | Run scan templates for supply chain, dependency, license, PII, jailbreak, prompt injection, adversarial, or OWASP-style categories. |
| Policy checks | Validate model or deployment changes against organization policies. |
| Pre-deployment gates | Enforce minimum eval score, vulnerability limits, policy pass status, approval requirements, or recent InferenceIQ verdicts. |
| Promotion requests | Request approval to move a model or deployment into staging or production. |
| Approval workflows | Track who approved, rejected, or applied a governed action. |
| Lineage | See where models came from and where they are deployed. |
| Audit logs | Preserve operational and compliance history. |
Gate rule types include:
| Rule Type | Meaning |
|---|---|
min_eval_score |
Require a minimum evaluation score before deployment or promotion. |
max_critical_vulns |
Block when critical vulnerability count exceeds policy. |
policy_checks_pass |
Require policy checks to pass. |
approval_required |
Require human approval. |
inferenceiq_recent_verdict |
Require a recent acceptable InferenceIQ verdict. |
Production guidance:
- Keep production models on approved versions.
- Require evaluation and security evidence before promotion.
- Use GitOps for reviewable manifest changes.
- Avoid permanent gate overrides.
- Restrict public endpoints with TLS, auth, rate limits, logging, and monitoring.
InferenceIQ Optimization
InferenceIQ is the ModelOps optimization workspace for serving performance, cost, and reliability. It can be used before deployment, during tuning, or after production monitoring shows bottlenecks.
Main InferenceIQ Areas
| Area | Path | Use It For |
|---|---|---|
| Dashboard | /modelops/inferenceiq |
See stats, active experiments, tools, and recent plans. |
| Planner | /modelops/inferenceiq/planner |
Generate AI-assisted optimization plans. |
| Experiments | /modelops/inferenceiq/experiments |
Run, inspect, favorite, and compare optimization experiments. |
| Runs | /modelops/inferenceiq/runs/{runId} |
Review optimization run details. |
| Benchmarking | /modelops/inferenceiq/benchmark |
Validate latency, throughput, and quality. |
| Registry | /modelops/inferenceiq/registry |
Review validated configurations and reusable optimization knowledge. |
| Monitoring | /modelops/inferenceiq/monitor |
Watch production inference metrics and feedback signals. |
| Governance settings | /modelops/inferenceiq/settings/governance |
Configure optimization governance. |
Optimization Tools
InferenceIQ organizes core optimization capabilities as L0-L8 tools:
| Level | Tool | Typical Use |
|---|---|---|
| L0 | Hardware Profiling | Match model, GPU, memory, and deployment context. |
| L1 | Kernel Tuning | Tune FlashAttention, PagedAttention, fused ops, and engine kernels. |
| L2 | Quantization | Reduce precision to improve memory use, speed, or cost. |
| L3 | Parallelism | Choose tensor, pipeline, or data parallelism strategies. |
| L4 | Speculative Decoding | Use draft models to reduce generation latency. |
| L5 | Batching/Scheduling | Tune continuous batching, scheduling, and KV-cache behavior. |
| L6 | Sparsity | Apply sparse compute or weight strategies. |
| L7 | Pruning | Remove redundant model structure. |
| L8 | Distillation | Transfer behavior into a smaller model. |
InferenceIQ can order these tools differently for throughput, latency, or cost objectives. Some tools can apply configuration directly to deployment settings; others require validation before production use.
Recommended loop:
- Start with the model and deployment goal: throughput, latency, or cost.
- Run L0 hardware profiling or use existing telemetry.
- Run one or more optimization experiments.
- Benchmark candidates against your quality floor.
- Save validated configurations to the registry.
- Apply a validated configuration in the deployment wizard or deployment detail flow.
- Monitor production behavior and feed results back into future plans.
Common Workflows
Register and Deploy a Hugging Face LLM
- Open ModelOps -> Register Model.
- Choose Hugging Face and enter the model ID.
- Review source analysis warnings, size, framework, and recommended engine.
- Set name, display name, visibility, and tags.
- Choose Chat or Completion and a serving engine such as vLLM or TGI.
- Submit the registration.
- Trigger a serving image build if one is required.
- Open Deployments -> Create.
- Select Kubernetes, Docker, or hosted target.
- Select the model version and successful build.
- Configure resources, networking, health, compliance, and GitOps.
- Review and deploy.
- Watch the deployment detail page until status is healthy.
Promote a Model to Production
- Confirm the model version is approved or request approval.
- Run required evaluation suites and security scans.
- Check gate readiness for the production environment.
- Resolve failed rules or create a justified override.
- Submit a promotion request with evaluation evidence.
- Deploy through GitOps if required.
- Monitor SLO compliance after rollout.
Tune an Existing Deployment
- Open InferenceIQ.
- Choose the relevant objective and tool sequence.
- Run experiments or open Planner for recommendations.
- Benchmark the best candidate.
- Save or validate the resulting config.
- Apply the config to a deployment or use it in the deployment wizard.
- Compare production telemetry before and after the change.
Investigate a Failing Deployment
- Open the deployment detail page.
- Check high-level status, recent events, and health probes.
- Open deployment logs.
- Review build status if the image failed to pull or start.
- Check resource feasibility and GPU availability.
- Inspect networking, auth, TLS, and rate-limit configuration for endpoint failures.
- Review monitoring metrics for latency, errors, saturation, or crash loops.
- Roll back, scale, or redeploy after identifying the cause.
Troubleshooting
| Symptom | Likely Cause | What To Check |
|---|---|---|
| Model source analysis fails | Bad URI, missing integration, unsupported source, private repository, or invalid credentials. | Source URI, integration credentials, access permissions, and source-specific warnings. |
| Registration succeeds but build is missing | Docker image source was used, build was not triggered, or build failed. | Build Status and /modelops/builds/{buildId} logs. |
| Build fails | Dependency conflict, missing base image, registry auth failure, large model packaging issue. | Dockerfile preview, build logs, registry settings, dependency list, and model size strategy. |
| Cannot proceed in deployment wizard | Required field missing for the active runtime. | Target selection, model ID, resource GPU type/count, inference engine, VM/cluster/provider selection. |
| Kubernetes target is infeasible | Cluster lacks GPU, memory, node pool, or topology required by the model. | Resource recommendation banner, cluster capabilities, node pool cards, and GPU telemetry. |
| Endpoint is unreachable | Ingress, service type, TLS, auth, DNS, or network policy mismatch. | Networking step, ingress host/path, auth method, generated policies, and access logs. |
| Deployment is unhealthy | Probe path/port mismatch, slow model load, insufficient resources, image pull failure, bad environment variables. | Health step, logs, events, readiness/liveness settings, image pull secrets, and weights mode. |
| Gate blocks deployment | Required eval, scan, policy, approval, or InferenceIQ verdict is missing or failed. | Gates page, latest gate evaluation, policy check requests, scan results, and approvals. |
| InferenceIQ cannot apply a config | Config not validated, incompatible engine, missing model/deployment target, or governance policy blocks apply. | InferenceIQ registry status, model engine, deployment resources, and governance settings. |
Quick Reference
Key Frontend Paths
| Task | Path |
|---|---|
| ModelOps home | /modelops |
| Register model | /modelops/register-model |
| Registry | /modelops/registry |
| Create deployment | /modelops/deployments/create |
| Deployment list | /modelops/deployments |
| Monitoring | /modelops/monitoring |
| Security scans | /modelops/security-scans |
| Policies | /modelops/policies |
| Gates | /modelops/gates |
| InferenceIQ | /modelops/inferenceiq |
Key API Groups
All paths below are under the ModelOps API prefix /api/v1.
| API Group | Purpose |
|---|---|
/models |
Model registry, source analysis, versions, approvals, aliases, deletion preview, lineage helpers. |
/builds |
Dockerfile preview, builds, logs, retries, registry push operations. |
/deployments |
Deployment records and lifecycle operations. |
/deployments-wizard |
Wizard-driven deployment creation and manifest generation. |
/deployment-templates |
Deployment template discovery and management. |
/gates |
Gate rules, readiness, evaluations, overrides, default seeding. |
/security-scan-templates and /security-scan-requests |
Security scan templates and scan execution. |
/policy-check-requests |
Policy check execution and status. |
/promotion-requests |
Promotion workflow requests. |
/evaluation-requests |
Evaluation orchestration over model or deployment targets. |
/inferenceiq/* |
InferenceIQ dashboard, planner, experiments, tools, registry, benchmark, monitoring, apply, and feedback APIs. |
Operational Defaults
| Setting | Default |
|---|---|
| Deployment environment from wizard | production unless changed by a specific flow. |
| Kubernetes namespace | default until changed. |
| Service port / target port | 8000. |
| Liveness probe | Enabled, /health, initial delay 30s, period 10s. |
| Readiness probe | Enabled, /health, initial delay 15s, period 5s. |
| Audit retention in compliance step | 90 days. |
| Access-control policy in compliance step | RBAC. |
| GitOps branch | main. |
Related Documentation
- Backend Services Overview
- InferenceIQ Experiment Engine
- InferenceIQ L1 Kernel Tuning
- InferenceIQ L4 Speculative Decoding
- Archived engineering analysis:
analysis-docs/modelops/