Prompts and scores

Once your traces are flowing into Langfuse, two more things become useful:

Versioned prompts — keep the prompt text out of your code, change it without redeploying, A/B test in production.
Evaluation scores — record how well a run did and tie it back to the trace. Build dashboards, regression alarms, and rolling quality checks.

Both are implemented in cognis-trace and feature-gated behind langfuse.

Versioned prompts

use cognis_trace::exporters::langfuse::{LangfuseConfig, LangfusePromptClient};

let client = LangfusePromptClient::new(LangfuseConfig::from_env()?)?;

let latest = client.get("greeting").await?;            // newest version
let pinned = client.get_version("greeting", 3).await?; // exact version
let prod   = client.get_label("greeting", "production").await?;

Prompt carries:

Field	Type	Notes
`name`	`String`	Stable identifier.
`version`	`u32`	Monotonic.
`body`	`PromptBody`	Either `text(String)` or `chat(Vec<Message>)`.
`config`	`serde_json::Value`	Free-form metadata you maintain alongside the prompt.
`labels`	`Vec<String>`	E.g., `production`, `staging`, `experiment-a`.

When you wire a fetched prompt into a chain, the TracingHandler automatically stamps prompt_name and prompt_version on the resulting generation span — so you can filter “all calls using prompt v3” in Langfuse.

Submitting scores

Every run gets a run_id. Score it any time — during the run (in-band) or later (out-of-band).

In-band
Out-of-band

Submit a score directly through the handler:

use cognis_trace::{ScoreRecord, ScoreValue};

handler.record_score(ScoreRecord {
    run_id,
    trace_id: None,
    session_id: None,
    name: "novelty".into(),
    value: ScoreValue::Numeric(0.87),
    comment: Some("eval pipeline".into()),
});

Useful for synchronous eval — your eval ran, you have the answer, attach it to the trace before the user sees the response.

Score from a separate process (a nightly eval job, a thumbs-up/down on the front end, an LLM-as-judge scorer):

use cognis_trace::exporters::langfuse::{LangfuseConfig, LangfuseScorer};
use cognis_trace::{ScoreRecord, ScoreSink};

let scorer = LangfuseScorer::new(LangfuseConfig::from_env()?)?;
scorer.submit(ScoreRecord {
    run_id: existing_run_id,
    // …
}).await?;

LangfuseScorer doesn’t need a TracingHandler — it’s a thin posting client.

Score values

`ScoreValue`	Use for
`Numeric(f64)`	quality, novelty, helpfulness — anything on a continuous scale
`Categorical(String)`	”good” / “bad” / “skip”, custom labels
`Boolean(bool)`	binary thumbs up/down

Bring-your-own backend

If Langfuse isn’t your eval backend, implement ScoreSink:

use async_trait::async_trait;
use cognis_trace::{ScoreRecord, ScoreSink};

struct MyScorer;

#[async_trait]
impl ScoreSink for MyScorer {
    async fn submit(&self, record: ScoreRecord) -> Result<(), cognis_trace::TraceError> {
        // POST to your service.
        todo!()
    }
}

Pass Arc::new(MyScorer) anywhere a ScoreSink is expected. The trait is one method.

How it works

Prompt fetches are HTTP calls. Cache them locally if your access pattern is “fetch on every request” — client.get is fast but it’s still a round-trip.
Score submission is async and best-effort. The Langfuse scorer batches in the background like the trace exporter; failures are logged.
Run IDs link the world. A trace, its scores, the prompt version it used — all keyed on the same run_id Cognis generated for the run.

Evaluation

Run evals over a dataset and feed scores back.

Trace with Langfuse

Where the runs themselves land.

Get started

Core ideas

Building agents

Building RAG

Graph workflows

Observability

Patterns

Production

Prompts and scores

Versioned prompts

Submitting scores

Score values

Bring-your-own backend

How it works

See also

Evaluation

Trace with Langfuse

Get started

Core ideas

Building agents

Building RAG

Graph workflows

Observability

Patterns

Production

Documentation Index

​Versioned prompts

​Submitting scores

​Score values

​Bring-your-own backend

​How it works

​See also

Evaluation

Trace with Langfuse

Versioned prompts

Submitting scores

Score values

Bring-your-own backend

How it works

See also