Langfuse
An open-source LLM engineering platform for tracing, evals, prompt management, and metrics.
Langfuse is an open-source LLM engineering platform combining tracing, evaluations, prompt management, and cost/latency metrics. Self-host it or use the managed cloud; it's framework-agnostic and a popular open alternative to LangSmith.
Langfuse is an open-source LLM engineering platform that brings tracing, evaluation, prompt management, and metrics together. It captures detailed traces of your LLM and agent runs, lets you score them (manually, with LLM-as-judge, or via user feedback), manages and versions prompts, and tracks cost and latency — all in a tool you can self-host or run as a managed cloud.
It is aimed at teams who want a vendor-neutral, open-source backbone for LLM observability and evals, with the option of self-hosting for privacy or cost control. It is framework-agnostic and integrates broadly across the LLM tooling ecosystem.
Highlights
- Tracing — nested traces of LLM calls, tool calls, and agent steps, with cost and latency per span.
- Evaluations — LLM-as-judge, manual scoring, and user-feedback signals on traced runs.
- Prompt management — version, deploy, and A/B prompts without redeploying your app.
- Metrics & dashboards — quality, cost, and latency over time, sliced by version or user.
- Self-host or cloud — run it entirely in your own environment, or use the managed service.
In an AI-assisted workflow
Instrument your app with the SDK (or an OpenTelemetry integration), then traces, costs, and scores flow into Langfuse where you can build datasets and run evals against real traffic.
from langfuse import observe
@observe()
def answer(question: str) -> str:
... # traced automatically: inputs, outputs, latency, costTIP
Manage prompts in Langfuse rather than in code: you can iterate and roll back prompt versions in production without a deploy, and tie each version to its eval scores.
Good to know
Langfuse is open source (MIT) and free to self-host; a managed cloud with a free tier and paid plans is also available. You bring an LLM provider for judge-based evals. Compare with the commercial LangSmith and Braintrust, and the OTel-native Arize Phoenix.
Related
- Arize PhoenixAn open-source LLM observability and evaluation tool built on OpenTelemetry, runnable anywhere.
- LangSmithLangChain's platform for tracing, evaluating, and monitoring LLM apps — framework-agnostic.
- Best LLM & RAG Evaluation Tools in 2026: DeepEval vs RAGAS vs LangSmith vs Phoenix vs promptfooA decision guide to the LLM eval landscape — code-first frameworks vs. eval-and-observability platforms, open-source vs. hosted, and which fits your stack.
- LLM Observability EngineerUse this agent to make a production LLM app observable — tracing every step, scoring live traffic with online evals, and monitoring quality, cost, and latency — so you can debug agent runs and catch regressions in production. Examples — "add tracing to our RAG/agent so we can debug bad answers", "set up online evals and cost/latency dashboards", "production quality is slipping and we're flying blind".
- AgentOpsObservability for AI agents — session replay, cost and latency tracking, and debugging for multi-step runs.
- HeliconeOpen-source LLM observability and AI gateway with one-line integration — logging, tracing, caching, and cost/latency tracking across providers.
- RAGASAn open-source framework for evaluating retrieval-augmented generation with reference-free RAG metrics.