By default, every call toDocumentation Index
Fetch the complete documentation index at: https://cognis.vasanth.xyz/llms.txt
Use this file to discover all available pages before exploring further.
agent.run starts fresh. That’s fine for stateless tools, but most apps want continuity — the user’s name, what was just discussed, the long task the agent is partway through. Memory is what makes an agent feel like the same agent across turns.
How memory works
Three moving parts:- The system prompt is your one-shot voice for the agent. Set it with
with_system_prompt. - The memory is a
&mutstore the agent reads at the start of eachrunand writes to as the conversation grows. - The seed — what the memory chooses to show the model — may be a subset of what’s been written. A
Buffershows everything; aWindowshows only the last N;SummaryBufferMemoryshows a running summary plus recent turns.
Memory (or picking one of the built-ins) and passing it to AgentBuilder.
Quick example
stateful() keeps the memory across run calls. stateless() (the default) clears it between runs — useful when each request is independent.
Choosing a memory variant
| Variant | Mental model | When to use |
|---|---|---|
Buffer | ”Keep everything.” | Short conversations; debugging. |
Window::new(n) | ”Last N messages.” | Bounded context, no summarization cost. |
TokenBufferMemory::new(max_tokens) | ”Trim to fit a token budget.” | Want to maximize what fits without paying to summarize. |
SummaryMemory::new(client, threshold) | ”Summarize every N turns.” | Long sessions; periodic compaction. |
SummaryBufferMemory::new(client, max_tokens) | ”Token-budgeted buffer with summarized overflow.” | Best general-purpose choice for long chats. |
EntityMemory | ”Track entities across turns.” | The user keeps mentioning specific people / places / things. |
KnowledgeGraphMemory | ”Build a triple store of facts.” | Agent should reason over relationships. |
VectorMemory | ”Embed and retrieve.” | Long-term knowledge; semantic recall over chat history. |
HybridMemory | ”Combine multiple memories.” | Recent buffer + vector recall + entity tracking, all at once. |
Pinning a system message inside memory
Some memories support a pinned system message that’s always present at the start of the seed, regardless of trimming or summarization.Buffer, Window, TokenBufferMemory, and SummaryBufferMemory all accept this.
Customizing memory
If none of the built-ins fit, implementMemory:
seed() if your memory wants to project differently than it stores — e.g., embed-on-write but retrieve-relevant-on-seed.
How it works
- Memory lives on the agent, not on the graph state. That’s why
stateful()matters: it keeps the agent’sMemoryfield alive acrossruncalls. - Memory is read at the start of each
run(when stateful) to seed the conversation; the agent appends the run’s transcript to memory afterruncompletes, in one batch. Memory is not updated during the loop. - In stateless mode, memory is not written — every
runis independent. - Pinned system messages are always prepended — even if the rest of the memory has been summarized away.
- Summarization runs as an LLM call through the
clientyou handed the memory. It costs tokens; budget for it.
Backends and persistence
Memory state is in-process by default. For persistence across processes:- Use the agent’s
Backend(with_filesystem) to write a markdown / JSON memory file the agent reads back. - Use
cognis-graphcheckpointers to persist the agent’s graph state across processes — see Graph workflows → Checkpointing. Note that checkpointers persist what’s in the graph state struct, not the agent’sMemorybackend internals; if yourMemorykeeps state outside the graph (e.g. in aSummaryBufferMemory’s running summary), persist that separately or hydrate it from a store of your choice on startup.
See also
Quickstart
Add memory to the calculator agent.
Patterns → Stateful chat
A complete chat backend with memory and persistence.
Middleware
Summarization, ContextEditing, ContextInjection for cross-cutting context shaping.