If you’re shipping an LLM app to production, you need to know what it costs. Cognis computes USD cost on every model call using aDocumentation Index
Fetch the complete documentation index at: https://cognis.vasanth.xyz/llms.txt
Use this file to discover all available pages before exploring further.
PriceTable, attaches it to the corresponding span, and emits it as part of the trace. Defaults ship for the major providers; override per model when your contract differs.
What gets tracked
Every LLM call ends with aUsage carrying:
input_tokens— what you sent.output_tokens— what you got back.cache_read_tokens— provider cache hits (Anthropic, etc.).cache_write_tokens— bytes you wrote to the cache.
ModelPrice:
cost: { input, output, cache_read, cache_write, total } to each generation span. In Langfuse, this lands on the costDetails field.
Defaults
with_default_pricing() loads a snapshot of public pricing for OpenAI, Anthropic, Google, Azure OpenAI, and OpenRouter as of 2026-05.
Custom pricing
Override per model when your rate differs (negotiated discount, custom Azure deployment, on-prem):override_price adds to (or replaces) entries in the table. Use it for one-offs.
For full control, supply your own PriceTable:
Reading cost from observers
If you need cost in your own pipeline (custom alerting, budget enforcement) without going through Langfuse, observeOnEnd events on LLM Runnables and look at the attached Usage:
ModelCallLimit middleware bounds count; pair it with an observer that breaks on cost.
How it works
- Cost is computed at
on_llm_endtime, not at trace export time. That means it’s available to in-band observers, not just Langfuse. - Cache reads are counted at their discounted rate. A cached call is dramatically cheaper — pricing reflects that.
- Unknown models default to zero. No model name match → no cost. Add overrides for your own/local models if you want them costed.
- Computation is exact, not estimated. Token counts come from the provider response (or the streaming aggregator), not from a tokenizer running on your text.
See also
Trace with Langfuse
Where cost lands by default.
Middleware
ModelCallLimit, RateLimit, and other budget guards.Production → Going to production
Pricing and runtime tuning.