A vector store gives you “approximately relevant” docs. Retrieval-augmented quality usually wins or loses on what happens after — re-ranking the top-K with a model that scores query-document pairs directly, compressing redundant chunks, and ordering the final list to fight the lost-in-the-middle effect.Documentation Index
Fetch the complete documentation index at: https://cognis.vasanth.xyz/llms.txt
Use this file to discover all available pages before exploring further.
Why post-process
- Vector similarity is approximate. A bi-encoder (the embedder) scores query and doc separately. A cross-encoder reads them together and is consistently better at ranking — but too slow to use as the primary retriever.
- Top-K is noisy. Even after a cross-encoder, two of your K docs might be near-duplicates. Compression collapses them.
- LLMs lose the middle. Studies show models attend best to the start and end of context.
LongContextReorderputs the most-relevant chunks at the edges.
Cross-encoder reranking
CrossEncoder is a trait — bring your own scorer. The simplest path is FnCrossEncoder wrapping a closure or an HTTP call to a hosted reranker.
CrossEncoderReranker calls the cross-encoder, reorders the docs by score, and trims to top-N (configurable).
Compression pipeline
CompressorPipeline chains compressors that transform the retrieved docs — drop, rewrite, summarize, filter:
Runnable<Vec<Document>, Vec<Document>>, so the pipeline is just pipe-composed.
Long-context reorder
When you do hand a list of chunks to the model, put the best ones at the edges:Document transformers
Smaller transformations that don’t fit “compressor” framing:| Transformer | Effect |
|---|---|
LongContextReorder | Edge-first ordering. |
MetadataTransformer | Rewrite, filter, or attach metadata before docs reach the prompt. |
How a complete RAG chain looks
How it works
- Each post-processor is a Runnable. Pipe in any order, swap implementations, wrap with retries — same surface as everything else.
- Reranking widens the candidate set. Set
top_kon the retriever generously (20–50) and let the reranker trim. The vector store is fast at K=20; the reranker is the slow step but only runs over the candidates. LongContextReorderis order-only. It doesn’t drop or merge — it just shuffles for the LLM’s attention curve.- Compressors can change document count. Make sure downstream code handles fewer-than-expected docs.
See also
Retrievers
The other half of the chain.
Patterns → Long-context summarization
A worked long-doc compression flow.
Reference → cognis-rag
Full transformer / compressor list.