Product

Neural Router

The routing brain for production inference

RIM-powered decisions, canary and blue-green rollouts, circuit breakers, and intelligent failover, without databases or control-plane calls in the hot path.

Overview

Neural Router is the brain behind inwire's inference data plane. It receives validated requests from the Inference Gateway and makes real-time routing decisions using a combination of routing policies, live telemetry, and an embedded ML model called RIM (Routing Intelligence Model). It supports canary deployments, blue-green rollouts, shadow traffic, circuit breaking, and intelligent failover, all without calling any external service. Stateless, fast, and designed for production environments where milliseconds and reliability are everything.

Capabilities

  • RIM: Routing Intelligence Model

    An embedded machine learning model that predicts optimal routing targets based on current load, model latency, queue depth, and historical performance. Advisory by default. Operators always retain override control.

  • Advanced Rollout Strategies

    Native support for canary deployments with configurable traffic splits, blue-green switching with instant rollback, and shadow traffic for testing new model versions against production load without user impact.

  • Intelligent Failover and Fallback

    Automatic health probing of all targets. When a deployment goes unhealthy, traffic reroutes to healthy replicas within milliseconds. Configurable failover chains with priority ordering.

  • Admission Control

    Four-mode admission system: accept, queue, reject, or degrade. Under heavy load, the router can queue requests, reject low-priority traffic, or serve degraded responses (smaller model, cached results) rather than dropping everything.

  • Per-Target Circuit Breaker

    Individual circuit breakers for each deployment target. A single bad replica doesn't take down the entire endpoint. Configurable thresholds, half-open probing, and automatic recovery.

  • Decision Pipeline Architecture

    Every routing decision flows through a structured pipeline: Gates (hard policy checks) → Filter (eligible targets) → Predict (RIM scoring) → Clamp (enforce hard caps). Fully auditable, fully explainable.

  • Routing Telemetry and Feedback Loops

    Collects per-request telemetry (chosen target, latency, tokens generated, queue time) and feeds it back for RIM retraining and dashboard visualization. Every routing decision is recorded and analyzable.

  • Zero External Dependencies in the Hot Path

    No database calls, no control-plane API calls, no external service dependencies during request routing. All state comes from locally-cached routing snapshots. Sub-millisecond decision latency.

  • Weighted and Priority-Based Selection

    Combine weighted random selection with strict priority ordering. Support for affinity rules, geographic preferences, and cost-based routing. RIM predictions can augment or override weight-based selection.

  • Explainable Routing Decisions

    Every routing decision can be explained via the /explain endpoint. See exactly why a particular target was chosen, which gates passed, what RIM predicted, and what alternatives were considered. Essential for debugging and compliance.