Skip to main content

Documentation Index

Fetch the complete documentation index at: https://cognis.vasanth.xyz/llms.txt

Use this file to discover all available pages before exploring further.

Once your traces are flowing into Langfuse, two more things become useful:
  • Versioned prompts — keep the prompt text out of your code, change it without redeploying, A/B test in production.
  • Evaluation scores — record how well a run did and tie it back to the trace. Build dashboards, regression alarms, and rolling quality checks.
Both are implemented in cognis-trace and feature-gated behind langfuse.

Versioned prompts

use cognis_trace::exporters::langfuse::{LangfuseConfig, LangfusePromptClient};

let client = LangfusePromptClient::new(LangfuseConfig::from_env()?)?;

let latest = client.get("greeting").await?;            // newest version
let pinned = client.get_version("greeting", 3).await?; // exact version
let prod   = client.get_label("greeting", "production").await?;
Prompt carries:
FieldTypeNotes
nameStringStable identifier.
versionu32Monotonic.
bodyPromptBodyEither text(String) or chat(Vec<Message>).
configserde_json::ValueFree-form metadata you maintain alongside the prompt.
labelsVec<String>E.g., production, staging, experiment-a.
When you wire a fetched prompt into a chain, the TracingHandler automatically stamps prompt_name and prompt_version on the resulting generation span — so you can filter “all calls using prompt v3” in Langfuse.

Submitting scores

Every run gets a run_id. Score it any time — during the run (in-band) or later (out-of-band).
Submit a score directly through the handler:
use cognis_trace::{ScoreRecord, ScoreValue};

handler.record_score(ScoreRecord {
    run_id,
    trace_id: None,
    session_id: None,
    name: "novelty".into(),
    value: ScoreValue::Numeric(0.87),
    comment: Some("eval pipeline".into()),
});
Useful for synchronous eval — your eval ran, you have the answer, attach it to the trace before the user sees the response.

Score values

ScoreValueUse for
Numeric(f64)quality, novelty, helpfulness — anything on a continuous scale
Categorical(String)”good” / “bad” / “skip”, custom labels
Boolean(bool)binary thumbs up/down

Bring-your-own backend

If Langfuse isn’t your eval backend, implement ScoreSink:
use async_trait::async_trait;
use cognis_trace::{ScoreRecord, ScoreSink};

struct MyScorer;

#[async_trait]
impl ScoreSink for MyScorer {
    async fn submit(&self, record: ScoreRecord) -> Result<(), cognis_trace::TraceError> {
        // POST to your service.
        todo!()
    }
}
Pass Arc::new(MyScorer) anywhere a ScoreSink is expected. The trait is one method.

How it works

  • Prompt fetches are HTTP calls. Cache them locally if your access pattern is “fetch on every request” — client.get is fast but it’s still a round-trip.
  • Score submission is async and best-effort. The Langfuse scorer batches in the background like the trace exporter; failures are logged.
  • Run IDs link the world. A trace, its scores, the prompt version it used — all keyed on the same run_id Cognis generated for the run.

See also

Evaluation

Run evals over a dataset and feed scores back.

Trace with Langfuse

Where the runs themselves land.