LLM providers fail. Networks blip. Rate limits get hit. Cognis ships idiomatic patterns for all of these asDocumentation Index
Fetch the complete documentation index at: https://cognis.vasanth.xyz/llms.txt
Use this file to discover all available pages before exploring further.
Runnable wrappers and middleware so adding resilience is one line, not a refactor.
The mental model
Three layers, picked by which kind of failure you’re absorbing:Runnablewrappers — apply to any Runnable: aClient, a tool, a chain. Best for individual call resilience.- Agent middleware — applies to every model call inside the agent loop. Best for cross-cutting policy.
- Strategy — domain-specific recovery (LLM-as-judge, escalation chains, retry-with-different-model). Best when generic retry isn’t enough.
Quick example
A production-grade Client:Wrappers reference
| Wrapper | What it absorbs |
|---|---|
with_max_retries(n) | Transient errors, rate limits. Default policy is exponential. |
with_retry(RetryPolicy) | Custom policy — exponential, linear, fixed. |
with_timeout(Duration) | Slow responses (provider hangs). |
with_fallback(other) | Total failure of the primary. other must also be Runnable<I, O>. |
with_memory_cache(key_fn) | Repeated identical inputs (deduplicate at the call layer). |
Middleware reference
For policies that apply on every model call regardless of caller, use the middleware pipeline. Build aPipelinedClient and either use it directly or feed it through a custom provider when you need it inside an AgentBuilder agent — see Middleware → Wiring middleware into an agent.
| Middleware | Effect |
|---|---|
ModelRetry | Retry transient errors and 429s with backoff. |
ModelFallback | Fall through to a backup model. |
Recovery (FixedRecovery, FnRecovery) | Custom recovery on errors. |
RateLimit (TokenBucket, SlidingWindow, CostBased, Composite) | Rate-limit per minute / per second / by cost. |
ModelCallLimit, ToolCallLimit | Hard caps per pipeline run. |
ToolRetry (ToolRetryClassifier) | Retry tool calls the model emitted that failed. |
RateLimit pushed last means the limiter sees every retry attempt. See Middleware for the full catalog.
Retry policies
RetryPolicy::new(attempts) is the default exponential policy. For finer control:
Rate limiting strategies
RateLimit accepts any RateLimiter impl. Built-ins:
| Limiter | Behavior |
|---|---|
TokenBucket::new(rate_per_sec, burst) | Refill at rate_per_sec tokens/sec with a configurable burst. Best general-purpose. |
| Sliding-window | Stricter over a fixed window — useful for compliance bounds. |
| Cost-based | Charges cost, not call count. Lets cheap calls flow but throttles expensive ones. |
| Composite | Combine multiple limiters (per-second AND per-minute AND per-day). |
When retries don’t fit
Some failures aren’t transient. The model emitted bad JSON. The tool returned a 4xx your code can fix. Use recovery middleware for these:AgentBuilder agent, see the bridging pattern in Middleware → Wiring middleware into an agent.
How it works
- Wrappers compose by re-wrapping.
client.with_max_retries(3).with_timeout(d)builds nested Runnables — types are explicit at every layer. - Middleware runs outside-in: most-recently-pushed is outermost.
pipeline.push(ModelFallback).push(ModelRetry).push(RateLimit)means the rate limiter sees the original call, then retry runs (each retry hits the limiter again), then fallback fires only when retries are exhausted. - Errors carry structure.
CognisError::RateLimited { retry_after_ms }lets retry policy honor the provider’s hint.CognisError::ProviderError { provider, message, .. }distinguishes between provider classes. - Cancellation is cooperative. All wrappers honor
RunnableConfig::cancel_tokenanddeadline.
See also
Middleware
The full middleware catalog.
Caching
Don’t pay for repeated calls.
Going to production
Putting it all together.