Overview

Neural Router is the brain behind inwire's inference data plane. It receives validated requests from the Inference Gateway and makes real-time routing decisions using a combination of routing policies, live telemetry, and an embedded ML model called RIM (Routing Intelligence Model). It supports canary deployments, blue-green rollouts, shadow traffic, circuit breaking, and intelligent failover, all without calling any external service. Stateless, fast, and designed for production environments where milliseconds and reliability are everything.

Capabilities

RIM: Routing Intelligence Model
An embedded machine learning model that predicts optimal routing targets based on current load, model latency, queue depth, and historical performance. Advisory by default. Operators always retain override control.
Advanced Rollout Strategies
Native support for canary deployments with configurable traffic splits, blue-green switching with instant rollback, and shadow traffic for testing new model versions against production load without user impact.
Intelligent Failover and Fallback
Automatic health probing of all targets. When a deployment goes unhealthy, traffic reroutes to healthy replicas within milliseconds. Configurable failover chains with priority ordering.
Admission Control
Four-mode admission system: accept, queue, reject, or degrade. Under heavy load, the router can queue requests, reject low-priority traffic, or serve degraded responses (smaller model, cached results) rather than dropping everything.
Per-Target Circuit Breaker
Individual circuit breakers for each deployment target. A single bad replica doesn't take down the entire endpoint. Configurable thresholds, half-open probing, and automatic recovery.
Decision Pipeline Architecture
Every routing decision flows through a structured pipeline: Gates (hard policy checks) → Filter (eligible targets) → Predict (RIM scoring) → Clamp (enforce hard caps). Fully auditable, fully explainable.
Routing Telemetry and Feedback Loops
Collects per-request telemetry (chosen target, latency, tokens generated, queue time) and feeds it back for RIM retraining and dashboard visualization. Every routing decision is recorded and analyzable.
Zero External Dependencies in the Hot Path
No database calls, no control-plane API calls, no external service dependencies during request routing. All state comes from locally-cached routing snapshots. Sub-millisecond decision latency.
Weighted and Priority-Based Selection
Combine weighted random selection with strict priority ordering. Support for affinity rules, geographic preferences, and cost-based routing. RIM predictions can augment or override weight-based selection.
Explainable Routing Decisions
Every routing decision can be explained via the /explain endpoint. See exactly why a particular target was chosen, which gates passed, what RIM predicted, and what alternatives were considered. Essential for debugging and compliance.

Overview

Capabilities

RIM: Routing Intelligence Model

Advanced Rollout Strategies

Intelligent Failover and Fallback

Admission Control

Per-Target Circuit Breaker

Decision Pipeline Architecture

Routing Telemetry and Feedback Loops

Zero External Dependencies in the Hot Path

Weighted and Priority-Based Selection

Explainable Routing Decisions