Middleware is how Cognis adds production discipline aroundDocumentation Index
Fetch the complete documentation index at: https://cognis.vasanth.xyz/llms.txt
Use this file to discover all available pages before exploring further.
Client calls — retry, fallback, rate limits, redaction, prompt caching, planning, summarization. Each middleware wraps a Client and runs on every chat call. Multiple middlewares compose into a MiddlewarePipeline.
How it works
A middleware implementscognis::middleware::Middleware, a trait with one async method (call) that receives a MiddlewareCtx and an Arc<dyn Next>. The pipeline runs them in reverse-push order — the most-recently-pushed layer is the outermost wrapper.
RegexRedactor::call runs first, then ModelRetry::call, then the raw client. Push order is “innermost first.”
Middleware is not auto-wired into
AgentBuilder in v0.3. To run middleware inside an agent loop, wrap your client into a PipelinedClient and serve it through a custom LLMProvider — see Wiring middleware into an agent below.What’s in the box
The full catalog undercognis::middleware::*. Reach for these by job:
Resilience
| Middleware | Constructor |
|---|---|
ModelRetry | ModelRetry::new(max_attempts) — exponential backoff (100ms initial, 2x, 30s cap) by default |
ModelFallback | ModelFallback::new(fallback_client: Client) |
Recovery (FixedRecovery, FnRecovery) | Custom recovery on errors |
ToolRetry (ToolRetryClassifier) | ToolRetry::new(max_attempts) for retrying tool calls the model emitted that failed |
Rate and cost
| Middleware | Constructor |
|---|---|
RateLimit | RateLimit::new(Arc::new(TokenBucket::new(rate_per_sec, burst))) — also accepts SlidingWindow, Composite, CostBased |
ModelCallLimit | ModelCallLimit::new(cap) — hard cap on calls per pipeline run |
ToolCallLimit | ToolCallLimit::new(cap) |
Privacy
| Middleware | Constructor |
|---|---|
PiiRedactor | PiiRedactor::new() — masks common PII patterns (emails, phones, etc.) |
RegexRedactor | RegexRedactor::new() — bring your own patterns |
Prompt and context
| Middleware | Constructor |
|---|---|
PromptCaching | PromptCaching::new() (or ::default()) — Anthropic prompt-cache markers |
ContextEditing | ContextEditing::new(policy) — mutate messages before they go to the model |
ContextInjection | ContextInjection::new(provider) — inject context derived from your app state |
Summarization | Summarization::new(keep_last) — compress old turns when the transcript grows |
Planning and todos
| Middleware | Constructor |
|---|---|
Planning | Planning::new() |
TodoMiddleware | Maintain an internal task list the agent can read and update |
Tools
| Middleware | Effect |
|---|---|
ToolEmulator (MapEmulator, EmulatorSource) | Replay tool calls deterministically — great for tests |
ToolSelection | Steer the model toward a subset of tools per turn |
PatchToolCalls (FnToolCallPatcher) | Fix or rewrite the model’s tool calls before dispatch |
Approver + AgentBuilder::with_approver, not middleware. See Human-in-the-loop.
Workspace and subagents
| Middleware | Effect |
|---|---|
FilesystemMiddleware | Expose a virtual workspace to the model |
SubagentMiddleware (SubagentRouter) | Spawn subagents from inside the pipeline for context isolation |
Quick example — production stack
A reasonable defaults stack for a customer-facing client:Wiring middleware into an agent
AgentBuilder accepts a raw Client; it doesn’t take a PipelinedClient directly. Two ways to bridge:
Option 1 — PipelinedClient standalone, for code that calls the model directly without the agent harness:
LLMProvider that delegates to the pipeline. Wrap that provider into a Client, then hand it to AgentBuilder:
Writing your own middleware
The trait is small:See also
Production → Resilience
Patterns for
ModelRetry, ModelFallback, and Recovery.Production → Security
PII redaction, deny-lists, SSRF protection.
Human-in-the-loop
Approval-gated tools — different from middleware.
Reference → cognis
Full middleware re-export list.