Threat surface
Three categories of risk you can mitigate at the framework level:| Risk | Mitigation |
|---|---|
| Sensitive data leaving your process in prompts | PiiRedactor / RegexRedactor middleware |
| Model calling tools you didn’t intend | ToolAllowList / ToolDenyList / approver |
| Tools fetching URLs they shouldn’t (SSRF, internal networks) | SSRF-protected HTTP client, allow-listed hosts |
| Tools executing arbitrary code | Sandbox Backends, PythonRepl with restricted env |
PII redaction
PiiRedactor masks known PII patterns (emails, phone numbers, credit-card-like sequences, common identifiers) before any prompt leaves Cognis:
RegexRedactor accepts your own patterns:
PiiRedactor for the obvious, RegexRedactor for your own patterns. To wire either into an AgentBuilder agent, see Middleware → Wiring middleware into an agent.
Tool deny-lists and allow-lists
Restrict which tools the model is offered at request time, via the middleware pipeline:LimitTools(n) caps how many tool definitions are sent in any single model call — useful when you have a large tool registry and want the model to focus.
For tool-call approval gating (require human sign-off before specific tools run, regardless of allow-list), use Approver + AgentBuilder::with_approver — see Human-in-the-loop.
Human-in-the-loop for the riskiest tools
For actions that should never run unattended — moving money, sending email, deleting data — use HITL approval. Even with allow-lists, anApprover ensures every sensitive call faces a human first.
SSRF protection
Tools that fetch URLs are SSRF-prone. The HTTP tool primitives incognis::tools::http (feature tools-http) protect against:
- requests to internal IP ranges (
10.0.0.0/8,172.16.0.0/12,192.168.0.0/16) - requests to link-local addresses (
169.254.0.0/16) - requests to localhost (
127.0.0.0/8,::1) - redirect chains that escape an allow-listed host
examples/resilience/ssrf_protection.rs.
Sandboxing tool execution
For tools that run code (Python, shell), use a sandboxBackend. The agent reads / writes through the backend; the sandbox enforces FS and network limits.
- The agent writes files (you don’t want it to write outside
./scratch). - The agent runs shell or Python (you don’t want it touching
~/.ssh).
Backend that shells out to your isolation provider.
Output filtering
Models sometimes emit content that shouldn’t go to users — internal reasoning, system-prompt leaks, training-data slips. For high-risk applications:- Strip CoT before sending to the user. Reasoning models emit
<thinking>blocks; filter them out. - Run a moderation pass. Use a smaller, fast model to classify the output before display.
- Cap message length.
CapMessageLengthmiddleware truncates if a runaway loop produces a megabyte of text.
Prompt injection
Cognis can’t fully prevent prompt injection — that’s an open problem in the field. But you can shrink the surface:- Don’t put untrusted text in the system prompt. System prompts should be your own.
- Quote untrusted input. Wrap user input in clearly delimited blocks (“USER MESSAGE START / END”) so the model has a chance to recognize boundaries.
- Restrict tool actions for untrusted contexts. A user-facing chat agent should not have a
delete_accounttool.
How it works
- Middleware runs in the agent loop, before each LLM call. PII redaction sees the rendered prompt; tool gates see the model’s tool calls.
- Approvers run between the model’s reply and tool dispatch. The model can’t bypass them; the dispatcher always asks first.
- Backends mediate file operations. Sandboxed implementations refuse paths outside the allow-list; the agent gets an error it can recover from.
- SSRF protection is in the HTTP client, not the framework. Bring your own client if you need different rules.
See also
Patterns → HITL approval
Approval workflow for sensitive tools.
Middleware
Full catalog including
EditPolicy and WorkspaceLister.Going to production
Deploying with these defaults on.