Serving and Inference Gateway User Guide

Manage inference endpoints, keys, requests, observability, incidents, policies, rollouts, snapshots, routing, and failover.

Who This Guide Is For

Where To Go

Page Use It For
/serving Serving overview.
/serving/endpoints Endpoint catalog.
/serving/keys Serving API keys.
/serving/requests Request inspection.
/serving/observability Serving metrics and traces.
/serving/policies Routing and access policies.
/serving/rollouts Canary, blue/green, and rollout controls.
/serving/incidents Serving incidents.
/serving/snapshots Endpoint and routing snapshots.
/serving/rim Runtime intelligence and routing signals.

Core Concepts

Concept Meaning
Endpoint A stable serving interface for applications.
Inference Gateway The request entry point that authenticates, routes, and observes inference calls.
Neural Router The routing decision engine for deployment selection, traffic split, failover, and policy enforcement.
Rollout A controlled release pattern such as canary, blue/green, or shadow.
Snapshot A saved view of endpoint or routing configuration for review and rollback.

Common Workflows

Create a serving key

  1. Open Serving -> Keys.
  2. Create a key for a specific application or environment.
  3. Store it in a secret manager.
  4. Set usage limits if available.
  5. Rotate and revoke keys on schedule.

Investigate serving latency

  1. Open Observability.
  2. Filter by endpoint and time range.
  3. Inspect latency percentiles and error rate.
  4. Open traces for slow requests.
  5. Check routing policy and backend deployment health.
  6. Apply scaling, rollout, or routing change if needed.

Best Practices