A vector store answers “give me chunks similar to this query.” A retriever is the layer above that — same job, more knobs. Cognis ships eight retrievers; most apps use one or two, occasionally combined. They all share the same shape:Documentation Index
Fetch the complete documentation index at: https://cognis.vasanth.xyz/llms.txt
Use this file to discover all available pages before exploring further.
Runnable<String, Vec<Document>>.
Pick a retriever
| Retriever | What it does | Use when |
|---|---|---|
VectorRetriever | Vector similarity search over a VectorStore. | Default for embedding-based RAG. |
BM25Retriever | Sparse keyword retrieval. | Exact-term recall (names, IDs, code). |
EnsembleRetriever | Combines multiple retrievers with weights. | Hybrid dense + sparse. |
MultiVectorRetriever | Index multiple vectors per document (summary + chunks). | Long docs where a summary embedding routes to chunk-level retrieval. |
ParentDocumentRetriever | Retrieve small chunks; return the enclosing parents. | Want sharp matching but full-context generation. |
QueryTranslatorRetriever | LLM rewrites the query before retrieval. | Vague user queries that need expansion. |
CompressorPipeline | Chain of compressors (filter, rerank, summarize). | Post-process retrieved docs before they hit the model. |
CachingRetriever | Wraps any retriever with a hash-keyed cache. | Repeated identical queries (chat with re-asks). |
cognis::retrievers::* — those live in the umbrella because they hold a Client.
Quick example
Vec<Document>, ready to fold into a prompt or pass to the next stage.
Hybrid retrieval
Combine dense (vector) and sparse (BM25) retrieval for the best of both:Reranking
After initial retrieval, a cross-encoder can re-rank top-K candidates by direct query-document scoring:CrossEncoder as a trait; bring your own scorer (a small reranker model, a heuristic, or a remote service).
Filtering and metadata
Retrievers respect the metadata filters their underlying store supports:Composing in a chain
Retrievers are Runnables, so they pipe like anything else:How it works
- Retrievers compose. Layer caching, reranking, and translation by piping retrievers together.
top_kis a request, not a guarantee. A store with fewer thankmatching docs returns what it has.- Filters happen at the store layer when possible. When the underlying backend can do it (Qdrant, Pinecone, Weaviate), it does — no scan-then-filter penalty.
- Caching is a thin shell.
CachingRetrieverkeys on the query string; if your retriever takes a filter, two different filters with the same query are different cache entries.
See also
Reranking and compression
Cross-encoders, compressors, long-context reorder.
Indexing pipeline
Make sure the store has the right docs.
Patterns → Code Q&A
A complete retriever-driven Q&A.