Langfuse vs. LangSmith: Comparing LLM Observability Platforms

Compare Langfuse and LangSmith across tracing, evaluation, observability, prompt management, and production monitoring to choose the right platform for your LLM applications.

Jun 24, 2026 · 13 min read

When a chatbot starts giving bad answers, the instinct is to check the prompt. That works fine for a single LLM call. It stops working when the application is an agent that makes tool calls.

That missing context is what LLM observability platforms try to provide. They're not traditional application monitoring tools. A more traditional tool tells you about latency and error rates. An LLM observability platform tells you which tool call returned a bad result and whether a prompt change improved output quality.

Both Langfuse and LangSmith cover tracing, evaluation, and prompt management, and both released major updates in early 2026. They're not interchangeable, though. The difference comes down to deployment requirements, tech stack, and how your team runs evaluations.

Short answer: Langfuse fits teams that need open-source self-hosting, data control, or a stack outside LangChain. LangSmith fits teams already building with LangChain or LangGraph, but it is no longer limited to that ecosystem. If neither condition is true, I would look at pricing.

What Are Langfuse and LangSmith?

At a high level, both products make LLM applications observable, testable, and debuggable. Here's what each one is.

Langfuse versus LangSmith platform positioning overview. Image by Author.

What is Langfuse?

Langfuse is an open-source LLM engineering platform that launched in 2023. It covers tracing, prompt management, evaluation (LLM-as-judge, human annotation, and code-based checks), dataset experiments, and cost and latency monitoring. The core open-source product is MIT licensed.

In January 2026, ClickHouse announced a $400 million Series D and acquired Langfuse. Langfuse is now part of ClickHouse, the columnar database that already powered the Langfuse backend. The MIT license and open-source identity were confirmed as unchanged at the time.

Langfuse runs as a managed cloud service with US, EU, and Japan regions, or as a self-hosted open-source instance with no software license cost.

What is LangSmith?

LangSmith is the observability and evaluation platform built by LangChain Inc., the team behind LangChain and LangGraph. The platform is proprietary and closed-source. LangChain raised $125 million at a $1.25 billion valuation in October 2025.

Its main capabilities include tracing across an application run, visual debugging, automated evaluations, production monitoring, and prompt management via Prompt Hub and the Playground. In May 2026, LangChain launched SmithDB, a Rust-based data layer that now handles 100% of LangSmith's US Cloud ingestion. SmithDB drops P50 trace tree load to 92 milliseconds and full-text search to 400 milliseconds.

LangSmith is available as a managed cloud service, a hybrid deployment with a customer VPC data plane, or a self-hosted Enterprise deployment.

Open Source vs. Managed SaaS

The core difference between the two platforms is not "open source versus not open source." The real difference is control and portability on one side, and LangChain/LangGraph fit on the other. Langfuse lets you run the stack on your own infrastructure with no licensing cost. LangSmith needs less setup when your application already runs on LangChain or LangGraph.

One update changes how this comparison should be framed: LangSmith now supports OpenTelemetry tracing through the langsmith[otel] package and the LANGSMITH_OTEL_ENABLED=true environment variable. LangSmith is no longer limited to LangChain-only applications. Its closest integration remains with LangGraph, as I will cover in the tracing section.

Here's where the two platforms sit structurally:

Dimension	Langfuse	LangSmith
Source model	Open source (MIT)	Proprietary, closed source
Self-hosting	Free MIT self-hosting; enterprise controls paid	Enterprise contract required
Framework approach	Works across frameworks; broad integrations; OTel native	Closest fit for LangChain/LangGraph; OTel support
Data sovereignty	Full; air-gapped deployment possible	Hybrid and self-hosted for Enterprise customers
Backend database	ClickHouse	SmithDB (Rust/DataFusion)
Pricing model	Unit-based (traces + observations + scores)	Seat-based plus trace-based with dual retention tiers
Compliance	SOC 2 Type II, ISO 27001, GDPR, HIPAA	SOC 2 Type II, GDPR, HIPAA

The rest of the article unpacks what those differences mean in practice.

Tracing and Observability

Tracing is where the products start to separate. Both capture LLM calls, tool calls, and related metadata, but agent workflows expose the differences faster than simple prompt-response apps do.

Request tracing

Langfuse builds hierarchical traces that capture LLM calls, tool invocations, embeddings, and retrieval steps. You can filter by user, session, cost, latency, or custom metadata. In May 2026, Langfuse added full-text search backed by ClickHouse's native FTS engine, cutting searches that previously took close to 20 seconds to under half a second.

LangSmith captures every LLM call and tool use as an inspectable run tree. With SmithDB now handling all US Cloud ingestion, trace trees load at P50 in 92 milliseconds. LangSmith also includes unsupervised topic clustering, which groups traces by detected theme and gives teams a starting point when they have no idea what's wrong.

Agent workflow visibility

Langfuse added Agent Graphs in November 2025, visualizing execution flow for multi-step agents by inferring graph structure from observation timings and nesting. It works with any instrumented framework, with native LangGraph support included. A Trace Log View was added at the same time, giving a flat stream of agent steps for workflows that loop or branch heavily.

Langfuse agent graph for LangGraph execution. Image by Author.

LangSmith's LangGraph tracing captures every node, edge, and state transition in a run with zero configuration beyond setting an environment variable. LangSmith Studio lets you step through agent execution, inspect state at each node, and replay a trace with a different model or prompt. In a LangGraph application, this gives more context than a generic trace tree.

LangSmith trace tree for agent workflow. Image by Author.

Production monitoring

For production monitoring, both platforms track latency, token usage, cost, and error rates. LangSmith includes PagerDuty and webhook alerting for production incidents. Langfuse includes spend alerts with configurable thresholds. At this level, the monitoring features are similar.

Offline and Online Evaluation

Tracing tells you what happened. Evaluation tells you if it was good. In practice, these tools are more useful when evaluation is part of the workflow, not a pre-launch checklist.

LLM-as-a-judge and code evaluators

Langfuse's LLM-as-judge became fully open-source under MIT in June 2025. Any self-hosted user on v3.65.0 or later gets it without a commercial license. In May 2026, Langfuse shipped Code Evaluators: Python or TypeScript evaluate functions you write directly in the Langfuse UI. These run deterministic checks, such as JSON schema validation, regex validation, or tool argument verification, without token cost or a judge model call.

LangSmith offers configurable LLM-as-judge evaluators with Boolean, Categorical, and Continuous feedback types, plus built-in templates for Security, Safety, and Quality. It also supports few-shot correction, where human-labeled corrections on evaluator outputs feed back as few-shot examples to improve the evaluator's calibration over time.

Datasets, experiments, and human annotation

Offline evaluation works in both platforms through datasets and side-by-side experiment comparison. Langfuse added Score Analytics in November 2025 to measure evaluator alignment across precision, recall, F1, cost, and accuracy. Baseline comparison, also November 2025, lets you flag a specific run as the reference point and surface regressions against it.

Langfuse's GitHub Actions CI/CD integration, released in May 2026 via the langfuse/experiment-action, fails a workflow when experiment scores drop below a threshold. That turns evaluation into a deploy gate instead of a post-release review.

Langfuse evaluation loop with GitHub Actions. Image by Author.

LangSmith's evaluation setup has one billing behavior to note early: evaluators that add feedback to traces automatically upgrade those traces to extended retention. As I will cover in the pricing section, that changes the cost of evaluation workflows.

Prompt Versioning, Deployment, and A/B Testing

Prompt management here is more than version history. The workflow is: iterate in a sandbox, test against a dataset, promote to production, and roll back cleanly when something breaks.

Langfuse assigns every prompt version a version ID and uses labels like production and staging to control which version is live. Changing a label in the UI is how you deploy or roll back. Prompts are cached client-side by the SDK, so no latency is added to production calls when the SDK fetches the active version. Protected labels let admins restrict which roles can modify the production label, which matters when you have a mix of contributors with different levels of access.

LangSmith manages prompts via LangChain Hub with commit-hash versioning for pinning exact versions programmatically. The Prompt Hub includes a community library that Langfuse does not replicate. A/B testing via dataset experiments is available on both platforms.

In this category, the two products are closer than they are in hosting, pricing, or framework setup.

Langfuse vs. LangSmith for Agent Applications

Agents drove much of the feature work on both platforms over the past year. Where the agent is built matters here.

Langfuse surfaces available tools, highlights which tools were called, and shows arguments and call IDs. Expanded observation types distinguish tool calls, embeddings, and guardrail calls in the trace view. As I mentioned earlier, Code Evaluators can also verify tool arguments against a schema. The MCP server expanded in May 2026 to cover 15 tool categories, so agents in Claude Code, Cursor, or OpenAI Codex can query Langfuse data programmatically.

The LangGraph point from the tracing section shows up again here. LangSmith's agent support includes state inspection at every node, trace replay with alternative models, and LangSmith Studio for visual step-through debugging. The Monte Carlo engineering team, which runs a production system involving hundreds of sub-agents, cited this zero-setup LangGraph integration as a key reason they chose it.

For agents built with CrewAI, Pydantic AI, or other multi-agent frameworks, Langfuse has broader native instrumentation and often needs less manual setup.

Framework and SDK Integrations

Langfuse lists broad integrations across model providers, frameworks, gateways, no-code tools, analytics, and developer tools. Frameworks include LangChain, LangGraph, OpenAI Agents SDK, Pydantic AI, CrewAI, AutoGen, DSPy, Haystack, LlamaIndex, and others. The platform is OpenTelemetry native at the SDK level.

LangSmith's native SDKs cover Python, TypeScript, Go, and Java. Beyond LangChain and LangGraph, it works with the OpenAI SDK, Anthropic SDK, Vercel AI SDK, LlamaIndex, custom implementations, and OpenTelemetry. That means it is not a LangChain-only tracing tool, even if LangGraph remains its closest fit.

The practical question is not only whether a framework is supported, since most popular frameworks work with both platforms. It is how much instrumentation you need to write. LangGraph gets zero-config tracing in LangSmith. Other frameworks may take less setup in Langfuse. Setup effort varies by stack.

Langfuse Open Source vs. LangSmith Enterprise

Self-hosting changes the operational and compliance picture more than most feature categories do.

Langfuse's self-hosting is free under MIT. Docker Compose works for development or evaluation; production deployments usually use Kubernetes with Helm on GKE, EKS, or AKS. The stack includes ClickHouse, PostgreSQL, Redis, and S3-compatible storage, with a recommended minimum VM of 4 cores and 16 GiB RAM. The software license costs nothing, but your team owns the infrastructure and operations. Its paid self-hosted Enterprise Edition adds dedicated support, audit logs, SCIM, and SLAs.

On compliance, Langfuse Cloud holds SOC 2 Type II, ISO 27001, GDPR, and HIPAA certifications. LangSmith Cloud holds SOC 2 Type II, GDPR, and HIPAA. ISO 27001 is not listed for LangSmith. If your procurement process checks that box, that's a concrete difference.

LangSmith's self-hosting requires an Enterprise contract. There's no open-source, free self-hosting path available. Three deployment models (Cloud, Hybrid, and Self-hosted) all sit under the Enterprise umbrella. SmithDB for self-hosted LangSmith is in early access as of May 2026, not yet generally available.

Langfuse vs. LangSmith Pricing

The headline prices do not tell the whole story.

Pricing also changes often in this category. The numbers below reflect the official pages I checked in June 2026, but check the current pricing pages before you budget around either platform.

Langfuse pricing

Langfuse Cloud charges by units: one unit equals one trace, one observation, or one score. The formula is Units = Traces + Observations + Scores, so a tool-heavy agent run can cost more than a simple prompt-response trace. The free Hobby plan includes 50,000 units per month, 30-day retention, and two users. Core runs $29/month with 100,000 included units, unlimited users, and 90-day retention. Pro is $199/month with 3-year data access and compliance certifications. Enterprise starts at $2,499/month with custom volume pricing. Overage starts at $8 per 100,000 additional units.

As I mentioned earlier, self-hosted Langfuse has no software license cost. SCIM, audit logs, and enterprise support require a commercial license.

LangSmith pricing

LangSmith charges per seat and per trace. The Developer plan is free with 5,000 traces per month, one seat, and 14-day retention. Plus runs $39 per seat per month with 10,000 base traces included. Base traces have 14-day retention; extended traces keep data for 400 days and cost more. A team of five on Plus pays $195/month in seats before trace overage. Enterprise pricing is custom.

Data retention mechanics

As I mentioned earlier, extended retention kicks in automatically when evaluators add feedback to traces. Read the LangSmith billing documentation on auto-extended retention before setting up evaluation pipelines.

Those details matter because small differences in trace depth, evaluator usage, and retention can change the monthly bill.

Langfuse vs. LangSmith Comparison Table

As I mentioned earlier, the main differences are ownership, framework fit, evaluation workflow, and pricing. The table below compresses those points before the final decision sections.

Feature	Langfuse	LangSmith
Open source	Yes (MIT)	No (proprietary)
Self-hosting	Free MIT self-hosting; enterprise controls paid	Enterprise contract required
Evaluation	LLM-as-judge (MIT), code evaluators, human annotation, CI/CD	LLM-as-judge, human annotation, online evaluators, few-shot correction
Prompt management	Label-based deployment, SDK caching, prompt composability	Commit-hash versioning, community Prompt Hub
Ecosystem	Broad integrations, OTel native, works across frameworks	Closest fit for LangChain/LangGraph; OTel support
Agent support	Agent Graphs, Trace Log View, Code Evaluators, MCP server	LangSmith Studio, native LangGraph tracing, state inspection
Compliance	SOC 2 Type II, ISO 27001, GDPR, HIPAA	SOC 2 Type II, GDPR, HIPAA
Pricing model	Unit-based; unlimited users on paid plans	Seat-based + trace-based; dual retention tiers
Fit	Data sovereignty, non-LangChain stacks, CI/CD evaluation	LangGraph teams, managed SaaS preference

Mistakes When Choosing an LLM Observability Platform

First thing, in my view: Do not focus only on tracing. Tracing tells you what happened, but evaluation tells you whether the output was good. If you choose based on trace visualization alone, you are using the wrong criterion.

Second thing: Watch the pricing mechanics. As covered above, Langfuse costs grow with trace depth, while LangSmith's extended retention can change the cost of automated evaluation. Run the math before production.

Third, self-hosting does not mean the same thing in both products. The self-hosting section above shows why. If data sovereignty is a hard requirement, that difference may decide the comparison.

Finally, do not decide on framework compatibility alone. Stacks change. Deployment requirements and evaluation workflows are harder to swap later.

When to Choose Langfuse

Based on the trade-offs above, Langfuse fits better when:

Your team is not primarily using LangChain or LangGraph, and you're building with CrewAI, Pydantic AI, LlamaIndex, or direct API calls to OpenAI or Anthropic.
Data sovereignty is non-negotiable, and LLM inputs, outputs, and traces need to stay on your own infrastructure.
Your compliance checklist requires ISO 27001 in addition to SOC 2 and HIPAA.
Your team wants CI/CD-integrated evaluation with automated regression gates via GitHub Actions.
You need predictable costs for a growing team, since paid Cloud plans include unlimited users.

When to Choose LangSmith

Based on the same trade-offs, LangSmith fits better when:

You're building with LangGraph and want zero-configuration tracing, native graph visualization, and step-through debugging in LangSmith Studio.
Your team wants a managed platform with no infrastructure to run.
You value the community Prompt Hub for discovering and sharing prompts across teams outside your organization.
Your needs extend beyond observability into LangSmith's broader platform, which now includes agent deployment and Fleet management.

Conclusion

Langfuse and LangSmith both solve a real problem, and both have changed a lot over the past year. At this point, the trade-off is clear.

The decision is not about which platform has more features. It is the ownership and ecosystem trade-off from earlier. Do you need to control your data stack, or do you want less setup inside the LangChain/LangGraph world?

One caveat before you decide: both platforms change often. Check the changelogs before you commit.

For related background on the LangChain ecosystem, see our LangChain vs. LangGraph vs. LangSmith vs. LangFlow tutorial.

Author

Khalid Abdelaty

Can I switch from LangSmith to Langfuse later?

Does Langfuse still support self-hosting now that ClickHouse owns it?

Is LangSmith only for LangChain applications?

How does LangSmith's extended retention billing work?

Is the Hobby tier on Langfuse good enough to evaluate the platform properly?

Topics

Artificial Intelligence

Learn with DataCamp

Course

Developing LLM Applications with LangChain

3 hr

46.4K

Discover how to build AI-powered applications using LLMs, prompts, chains, and agents in LangChain.

See Details

Start Course

Course

Retrieval Augmented Generation (RAG) with LangChain

3 hr

17.9K

Learn cutting-edge methods for integrating external data with LLMs using Retrieval Augmented Generation (RAG) with LangChain.

See Details

Start Course

Course

LLM Application Evaluation with LangSmith

2 hr

Learn to systematically measure and improve LLM application quality.

See Details

Start Course

blog

LangChain vs LlamaIndex: A Detailed Comparison

Compare LangChain and LlamaIndex to discover their unique strengths, key features, and best use cases for NLP applications powered by large language models.

Iva Vrtaric

13 min

Tutorial

LangFuse Tutorial: LLM Engineering Platform For Monitoring And Evals

Build a document Q&A tool while discovering how LangFuse simplifies debugging, tracks costs, and organizes prompts in complex LLM applications.

Bex Tuychiev

Tutorial

LangChain vs LangGraph vs LangSmith vs LangFlow: Key Differences Explained

Compare LangChain, LangGraph, LangSmith, and LangFlow. Learn their roles, strengths, and when to use each for building production-ready AI applications.

Vaibhav Mehra

Tutorial

An Introduction to Debugging And Testing LLMs in LangSmith

Discover how LangSmith optimizes LLM testing and debugging for AI applications. Enhance quality assurance and streamline development with real-world examples.

Bex Tuychiev

Tutorial

Promptfoo Tutorial: A Hands-On Guide to LLM Evaluation

Build reliable AI apps faster by turning ad-hoc prompt checks into structured LLM evaluations with Promptfoo, from local test suites to automated CI.

Bex Tuychiev

Tutorial

Deploying LLM Applications with LangServe

Learn how to deploy LLM applications using LangServe. This comprehensive guide covers installation, integration, and best practices for efficient deployment.

Stanislav Karzhev

See More See More

What Are Langfuse and LangSmith?

What is Langfuse?

What is LangSmith?

Open Source vs. Managed SaaS

Tracing and Observability

Request tracing

Agent workflow visibility

Production monitoring

Offline and Online Evaluation

LLM-as-a-judge and code evaluators

Datasets, experiments, and human annotation

Prompt Versioning, Deployment, and A/B Testing

Langfuse vs. LangSmith for Agent Applications

Framework and SDK Integrations

Langfuse Open Source vs. LangSmith Enterprise

Langfuse vs. LangSmith Pricing

Langfuse pricing

LangSmith pricing

Data retention mechanics

Langfuse vs. LangSmith Comparison Table

Mistakes When Choosing an LLM Observability Platform

When to Choose Langfuse

When to Choose LangSmith

Conclusion

FAQs

Is LangSmith only for LangChain applications?

How does LangSmith's extended retention billing work?

Is the Hobby tier on Langfuse good enough to evaluate the platform properly?

LangChain vs LlamaIndex: A Detailed Comparison

LangFuse Tutorial: LLM Engineering Platform For Monitoring And Evals

LangChain vs LangGraph vs LangSmith vs LangFlow: Key Differences Explained

An Introduction to Debugging And Testing LLMs in LangSmith

Promptfoo Tutorial: A Hands-On Guide to LLM Evaluation

Deploying LLM Applications with LangServe

.css-1531qan{-webkit-text-decoration:none;text-decoration:none;color:inherit;}Developing LLM Applications with LangChain

Retrieval Augmented Generation (RAG) with LangChain

LLM Application Evaluation with LangSmith

LangChain vs LlamaIndex: A Detailed Comparison

LangFuse Tutorial: LLM Engineering Platform For Monitoring And Evals

LangChain vs LangGraph vs LangSmith vs LangFlow: Key Differences Explained

An Introduction to Debugging And Testing LLMs in LangSmith

Promptfoo Tutorial: A Hands-On Guide to LLM Evaluation

Deploying LLM Applications with LangServe

Developing LLM Applications with LangChain