A working agent is a starting point. Going to production is about adding the layers that aren’t about the agent’s behavior — they’re about keeping it up, watching it, and bounding what it can do. This page is a checklist. Each section links to the deeper guide.Documentation Index
Fetch the complete documentation index at: https://cognis.vasanth.xyz/llms.txt
Use this file to discover all available pages before exploring further.
Resilience
- ✅ Wrap the LLM client with
with_max_retries(3)andwith_timeout(Duration::from_secs(30)). - ✅ Add a fallback model via
Client::with_fallback(...)orModelFallback::new(backup_client)middleware. A cheaper backup beats a 5xx. - ✅ Retry on rate limits.
RetryPolicy::new(n)with backoff handles 429s correctly. - ✅ Cap retry attempts. Infinite retry is a bug. 3–5 attempts with exponential backoff.
- ✅ Cap loop iterations. Always set
with_max_iterations(n)on agents.
Rate and cost
- ✅ Rate-limit upstream.
RateLimit::new(Arc::new(TokenBucket::new(rate_per_sec, burst)))with bucket sized to your provider tier. - ✅ Track cost. Wire
cognis-tracewithwith_default_pricing()so every call has a USD attached. - ✅ Cap cost per run.
ModelCallLimitandToolCallLimitmiddleware bound how much one user request can spend. - ✅ Cache obvious repeats.
CachedEmbeddingsfor indexing;with_memory_cachefor LLM calls;PromptCachingmiddleware for Anthropic-style prefix caching.
Observability
- ✅ Wire
cognis-trace.LangfuseExporter::from_env()plus aTracingHandlerplus the wrappedHandlerObserveron everyRunnableConfig. - ✅ Set trace metadata.
TraceMeta::session(...),user(...),release(...),environment(...). Without these, you can’t filter your dashboards. - ✅ Drain on shutdown.
handler.shutdown().awaitbefore exit so batches flush. - ✅ Log structured errors. Cognis uses
tracing— make sure your subscriber is on.
Security
- ✅ PII redaction.
PiiRedactor::default()middleware on every customer-facing agent. AddRegexRedactorfor domain-specific patterns. - ✅ Tool deny-lists. Even if the model has no reason to call
delete_account, deny-list it. - ✅ HITL for the riskiest tools. Money, customer data, irreversible actions get an
Approver. - ✅ SSRF-safe HTTP tools. Use the built-in protected client; allow-list explicit hosts.
- ✅ Sandboxed FS for file-writing agents.
SandboxedFsBackendover a scratch directory. - ✅ No
.envfiles. Use envchain / direnv / your secret manager.
Persistence
- ✅ Pick a checkpointer.
SqliteCheckpointerfor single-host,PostgresCheckpointerfor multi-process. - ✅ Use stable thread ids. Tie them to your auth (user id, conversation id) so resume is deterministic.
- ✅ Test resume paths. Kill a process mid-run; verify the next call resumes correctly.
- ✅ Bound history. Don’t keep checkpoints forever — implement a TTL on your checkpointer.
Memory
- ✅ Pick a memory variant.
SummaryBufferMemoryis the safe default; switch when usage tells you otherwise. - ✅ Pin the system prompt. Use
with_system(...)on the memory so it survives summarization. - ✅ Bound the budget.
SummaryBufferMemory::new(client, max_tokens)withmax_tokensmatched to your model and your latency target.
Streaming
- ✅ Stream by default. Long agents feel broken when they don’t stream. Use
stream_eventsorstream_mode. - ✅ Handle disconnects. When the client disconnects, cancel the run with
cancel_token. - ✅ Protect against backpressure. Bounded channels; drop or summarize on overflow.
Evals and CI
- ✅ Build a small golden set. 30–100 cases of “I know what the answer should be.”
- ✅ Run evals on every change.
EvalRunnerover the golden set; gate CI on regressions. - ✅ Push scores to Langfuse. Tie eval scores to traces so you can drill from “this regressed” to “the trace that broke it.”
Deploys
- ✅ Build with the right features.
cargo build --release --features all-providers,langfuse,vectorstore-faiss(adapt to your stack). - ✅ Pin secrets in your runner’s secret store. Same env-var names as local dev.
- ✅ Health checks. Wire a tiny endpoint that calls
client.provider().health_check().awaitso your platform’s load balancer can detect bad pods. - ✅ Graceful shutdown. Drain the trace handler, cancel in-flight requests, then exit.
Cost containers
- ✅ Per-tenant limits. Different rate buckets for different customers. Use one bucket per tenant; share the same
RateLimitmiddleware. - ✅ Daily caps. Sliding-window or composite limiter that stops a runaway use case before it surprises your bill.
- ✅ Fallback to cheaper. Production stack: try the good model, fall back to a cheap one, then return a graceful error.
How it all fits
A reference production agent setup:See also
Resilience
Retries, fallbacks, recovery.
Observability
Where your runs land.
Security
PII, tools, sandboxes.
Caching
Don’t pay twice.