Course
When a chatbot starts giving bad answers, the instinct is to check the prompt. That works fine for a single LLM call. It stops working when the application is an agent that makes tool calls.
That missing context is what LLM observability platforms try to provide. They're not traditional application monitoring tools. A more traditional tool tells you about latency and error rates. An LLM observability platform tells you which tool call returned a bad result and whether a prompt change improved output quality.
Both Langfuse and LangSmith cover tracing, evaluation, and prompt management, and both released major updates in early 2026. They're not interchangeable, though. The difference comes down to deployment requirements, tech stack, and how your team runs evaluations.
Short answer: Langfuse fits teams that need open-source self-hosting, data control, or a stack outside LangChain. LangSmith fits teams already building with LangChain or LangGraph, but it is no longer limited to that ecosystem. If neither condition is true, I would look at pricing.
What Are Langfuse and LangSmith?
At a high level, both products make LLM applications observable, testable, and debuggable. Here's what each one is.

Langfuse versus LangSmith platform positioning overview. Image by Author.
What is Langfuse?
Langfuse is an open-source LLM engineering platform that launched in 2023. It covers tracing, prompt management, evaluation (LLM-as-judge, human annotation, and code-based checks), dataset experiments, and cost and latency monitoring. The core open-source product is MIT licensed.
In January 2026, ClickHouse announced a $400 million Series D and acquired Langfuse. Langfuse is now part of ClickHouse, the columnar database that already powered the Langfuse backend. The MIT license and open-source identity were confirmed as unchanged at the time.
Langfuse runs as a managed cloud service with US, EU, and Japan regions, or as a self-hosted open-source instance with no software license cost.
What is LangSmith?
LangSmith is the observability and evaluation platform built by LangChain Inc., the team behind LangChain and LangGraph. The platform is proprietary and closed-source. LangChain raised $125 million at a $1.25 billion valuation in October 2025.
Its main capabilities include tracing across an application run, visual debugging, automated evaluations, production monitoring, and prompt management via Prompt Hub and the Playground. In May 2026, LangChain launched SmithDB, a Rust-based data layer that now handles 100% of LangSmith's US Cloud ingestion. SmithDB drops P50 trace tree load to 92 milliseconds and full-text search to 400 milliseconds.
LangSmith is available as a managed cloud service, a hybrid deployment with a customer VPC data plane, or a self-hosted Enterprise deployment.
Open Source vs. Managed SaaS
The core difference between the two platforms is not "open source versus not open source." The real difference is control and portability on one side, and LangChain/LangGraph fit on the other. Langfuse lets you run the stack on your own infrastructure with no licensing cost. LangSmith needs less setup when your application already runs on LangChain or LangGraph.
One update changes how this comparison should be framed: LangSmith now supports OpenTelemetry tracing through the langsmith[otel] package and the LANGSMITH_OTEL_ENABLED=true environment variable. LangSmith is no longer limited to LangChain-only applications. Its closest integration remains with LangGraph, as I will cover in the tracing section.
Here's where the two platforms sit structurally:
|
Dimension |
Langfuse |
LangSmith |
|
Source model |
Open source (MIT) |
Proprietary, closed source |
|
Self-hosting |
Free MIT self-hosting; enterprise controls paid |
Enterprise contract required |
|
Framework approach |
Works across frameworks; broad integrations; OTel native |
Closest fit for LangChain/LangGraph; OTel support |
|
Data sovereignty |
Full; air-gapped deployment possible |
Hybrid and self-hosted for Enterprise customers |
|
Backend database |
ClickHouse |
SmithDB (Rust/DataFusion) |
|
Pricing model |
Unit-based (traces + observations + scores) |
Seat-based plus trace-based with dual retention tiers |
|
Compliance |
SOC 2 Type II, ISO 27001, GDPR, HIPAA |
SOC 2 Type II, GDPR, HIPAA |
The rest of the article unpacks what those differences mean in practice.
Tracing and Observability
Tracing is where the products start to separate. Both capture LLM calls, tool calls, and related metadata, but agent workflows expose the differences faster than simple prompt-response apps do.
Request tracing
Langfuse builds hierarchical traces that capture LLM calls, tool invocations, embeddings, and retrieval steps. You can filter by user, session, cost, latency, or custom metadata. In May 2026, Langfuse added full-text search backed by ClickHouse's native FTS engine, cutting searches that previously took close to 20 seconds to under half a second.
LangSmith captures every LLM call and tool use as an inspectable run tree. With SmithDB now handling all US Cloud ingestion, trace trees load at P50 in 92 milliseconds. LangSmith also includes unsupervised topic clustering, which groups traces by detected theme and gives teams a starting point when they have no idea what's wrong.
Agent workflow visibility
Langfuse added Agent Graphs in November 2025, visualizing execution flow for multi-step agents by inferring graph structure from observation timings and nesting. It works with any instrumented framework, with native LangGraph support included. A Trace Log View was added at the same time, giving a flat stream of agent steps for workflows that loop or branch heavily.

Langfuse agent graph for LangGraph execution. Image by Author.
LangSmith's LangGraph tracing captures every node, edge, and state transition in a run with zero configuration beyond setting an environment variable. LangSmith Studio lets you step through agent execution, inspect state at each node, and replay a trace with a different model or prompt. In a LangGraph application, this gives more context than a generic trace tree.

LangSmith trace tree for agent workflow. Image by Author.
Production monitoring
For production monitoring, both platforms track latency, token usage, cost, and error rates. LangSmith includes PagerDuty and webhook alerting for production incidents. Langfuse includes spend alerts with configurable thresholds. At this level, the monitoring features are similar.
Offline and Online Evaluation
Tracing tells you what happened. Evaluation tells you if it was good. In practice, these tools are more useful when evaluation is part of the workflow, not a pre-launch checklist.
LLM-as-a-judge and code evaluators
Langfuse's LLM-as-judge became fully open-source under MIT in June 2025. Any self-hosted user on v3.65.0 or later gets it without a commercial license. In May 2026, Langfuse shipped Code Evaluators: Python or TypeScript evaluate functions you write directly in the Langfuse UI. These run deterministic checks, such as JSON schema validation, regex validation, or tool argument verification, without token cost or a judge model call.
LangSmith offers configurable LLM-as-judge evaluators with Boolean, Categorical, and Continuous feedback types, plus built-in templates for Security, Safety, and Quality. It also supports few-shot correction, where human-labeled corrections on evaluator outputs feed back as few-shot examples to improve the evaluator's calibration over time.
Datasets, experiments, and human annotation
Offline evaluation works in both platforms through datasets and side-by-side experiment comparison. Langfuse added Score Analytics in November 2025 to measure evaluator alignment across precision, recall, F1, cost, and accuracy. Baseline comparison, also November 2025, lets you flag a specific run as the reference point and surface regressions against it.
Langfuse's GitHub Actions CI/CD integration, released in May 2026 via the langfuse/experiment-action, fails a workflow when experiment scores drop below a threshold. That turns evaluation into a deploy gate instead of a post-release review.

Langfuse evaluation loop with GitHub Actions. Image by Author.
LangSmith's evaluation setup has one billing behavior to note early: evaluators that add feedback to traces automatically upgrade those traces to extended retention. As I will cover in the pricing section, that changes the cost of evaluation workflows.
Prompt Versioning, Deployment, and A/B Testing
Prompt management here is more than version history. The workflow is: iterate in a sandbox, test against a dataset, promote to production, and roll back cleanly when something breaks.
Langfuse assigns every prompt version a version ID and uses labels like production and staging to control which version is live. Changing a label in the UI is how you deploy or roll back. Prompts are cached client-side by the SDK, so no latency is added to production calls when the SDK fetches the active version. Protected labels let admins restrict which roles can modify the production label, which matters when you have a mix of contributors with different levels of access.
LangSmith manages prompts via LangChain Hub with commit-hash versioning for pinning exact versions programmatically. The Prompt Hub includes a community library that Langfuse does not replicate. A/B testing via dataset experiments is available on both platforms.
In this category, the two products are closer than they are in hosting, pricing, or framework setup.
Langfuse vs. LangSmith for Agent Applications
Agents drove much of the feature work on both platforms over the past year. Where the agent is built matters here.
Langfuse surfaces available tools, highlights which tools were called, and shows arguments and call IDs. Expanded observation types distinguish tool calls, embeddings, and guardrail calls in the trace view. As I mentioned earlier, Code Evaluators can also verify tool arguments against a schema. The MCP server expanded in May 2026 to cover 15 tool categories, so agents in Claude Code, Cursor, or OpenAI Codex can query Langfuse data programmatically.
The LangGraph point from the tracing section shows up again here. LangSmith's agent support includes state inspection at every node, trace replay with alternative models, and LangSmith Studio for visual step-through debugging. The Monte Carlo engineering team, which runs a production system involving hundreds of sub-agents, cited this zero-setup LangGraph integration as a key reason they chose it.
For agents built with CrewAI, Pydantic AI, or other multi-agent frameworks, Langfuse has broader native instrumentation and often needs less manual setup.
Framework and SDK Integrations
Langfuse lists broad integrations across model providers, frameworks, gateways, no-code tools, analytics, and developer tools. Frameworks include LangChain, LangGraph, OpenAI Agents SDK, Pydantic AI, CrewAI, AutoGen, DSPy, Haystack, LlamaIndex, and others. The platform is OpenTelemetry native at the SDK level.
LangSmith's native SDKs cover Python, TypeScript, Go, and Java. Beyond LangChain and LangGraph, it works with the OpenAI SDK, Anthropic SDK, Vercel AI SDK, LlamaIndex, custom implementations, and OpenTelemetry. That means it is not a LangChain-only tracing tool, even if LangGraph remains its closest fit.
The practical question is not only whether a framework is supported, since most popular frameworks work with both platforms. It is how much instrumentation you need to write. LangGraph gets zero-config tracing in LangSmith. Other frameworks may take less setup in Langfuse. Setup effort varies by stack.
Langfuse Open Source vs. LangSmith Enterprise
Self-hosting changes the operational and compliance picture more than most feature categories do.
Langfuse's self-hosting is free under MIT. Docker Compose works for development or evaluation; production deployments usually use Kubernetes with Helm on GKE, EKS, or AKS. The stack includes ClickHouse, PostgreSQL, Redis, and S3-compatible storage, with a recommended minimum VM of 4 cores and 16 GiB RAM. The software license costs nothing, but your team owns the infrastructure and operations. Its paid self-hosted Enterprise Edition adds dedicated support, audit logs, SCIM, and SLAs.
On compliance, Langfuse Cloud holds SOC 2 Type II, ISO 27001, GDPR, and HIPAA certifications. LangSmith Cloud holds SOC 2 Type II, GDPR, and HIPAA. ISO 27001 is not listed for LangSmith. If your procurement process checks that box, that's a concrete difference.
LangSmith's self-hosting requires an Enterprise contract. There's no open-source, free self-hosting path available. Three deployment models (Cloud, Hybrid, and Self-hosted) all sit under the Enterprise umbrella. SmithDB for self-hosted LangSmith is in early access as of May 2026, not yet generally available.
Langfuse vs. LangSmith Pricing
The headline prices do not tell the whole story.
Pricing also changes often in this category. The numbers below reflect the official pages I checked in June 2026, but check the current pricing pages before you budget around either platform.
Langfuse pricing
Langfuse Cloud charges by units: one unit equals one trace, one observation, or one score. The formula is Units = Traces + Observations + Scores, so a tool-heavy agent run can cost more than a simple prompt-response trace. The free Hobby plan includes 50,000 units per month, 30-day retention, and two users. Core runs $29/month with 100,000 included units, unlimited users, and 90-day retention. Pro is $199/month with 3-year data access and compliance certifications. Enterprise starts at $2,499/month with custom volume pricing. Overage starts at $8 per 100,000 additional units.
As I mentioned earlier, self-hosted Langfuse has no software license cost. SCIM, audit logs, and enterprise support require a commercial license.
LangSmith pricing
LangSmith charges per seat and per trace. The Developer plan is free with 5,000 traces per month, one seat, and 14-day retention. Plus runs $39 per seat per month with 10,000 base traces included. Base traces have 14-day retention; extended traces keep data for 400 days and cost more. A team of five on Plus pays $195/month in seats before trace overage. Enterprise pricing is custom.
Data retention mechanics
As I mentioned earlier, extended retention kicks in automatically when evaluators add feedback to traces. Read the LangSmith billing documentation on auto-extended retention before setting up evaluation pipelines.
Those details matter because small differences in trace depth, evaluator usage, and retention can change the monthly bill.
Langfuse vs. LangSmith Comparison Table
As I mentioned earlier, the main differences are ownership, framework fit, evaluation workflow, and pricing. The table below compresses those points before the final decision sections.
|
Feature |
Langfuse |
LangSmith |
|
Open source |
Yes (MIT) |
No (proprietary) |
|
Self-hosting |
Free MIT self-hosting; enterprise controls paid |
Enterprise contract required |
|
Evaluation |
LLM-as-judge (MIT), code evaluators, human annotation, CI/CD |
LLM-as-judge, human annotation, online evaluators, few-shot correction |
|
Prompt management |
Label-based deployment, SDK caching, prompt composability |
Commit-hash versioning, community Prompt Hub |
|
Ecosystem |
Broad integrations, OTel native, works across frameworks |
Closest fit for LangChain/LangGraph; OTel support |
|
Agent support |
Agent Graphs, Trace Log View, Code Evaluators, MCP server |
LangSmith Studio, native LangGraph tracing, state inspection |
|
Compliance |
SOC 2 Type II, ISO 27001, GDPR, HIPAA |
SOC 2 Type II, GDPR, HIPAA |
|
Pricing model |
Unit-based; unlimited users on paid plans |
Seat-based + trace-based; dual retention tiers |
|
Fit |
Data sovereignty, non-LangChain stacks, CI/CD evaluation |
LangGraph teams, managed SaaS preference |
Mistakes When Choosing an LLM Observability Platform
First thing, in my view: Do not focus only on tracing. Tracing tells you what happened, but evaluation tells you whether the output was good. If you choose based on trace visualization alone, you are using the wrong criterion.
Second thing: Watch the pricing mechanics. As covered above, Langfuse costs grow with trace depth, while LangSmith's extended retention can change the cost of automated evaluation. Run the math before production.
Third, self-hosting does not mean the same thing in both products. The self-hosting section above shows why. If data sovereignty is a hard requirement, that difference may decide the comparison.
Finally, do not decide on framework compatibility alone. Stacks change. Deployment requirements and evaluation workflows are harder to swap later.
When to Choose Langfuse
Based on the trade-offs above, Langfuse fits better when:
- Your team is not primarily using LangChain or LangGraph, and you're building with CrewAI, Pydantic AI, LlamaIndex, or direct API calls to OpenAI or Anthropic.
- Data sovereignty is non-negotiable, and LLM inputs, outputs, and traces need to stay on your own infrastructure.
- Your compliance checklist requires ISO 27001 in addition to SOC 2 and HIPAA.
- Your team wants CI/CD-integrated evaluation with automated regression gates via GitHub Actions.
- You need predictable costs for a growing team, since paid Cloud plans include unlimited users.
When to Choose LangSmith
Based on the same trade-offs, LangSmith fits better when:
- You're building with LangGraph and want zero-configuration tracing, native graph visualization, and step-through debugging in LangSmith Studio.
- Your team wants a managed platform with no infrastructure to run.
- You value the community Prompt Hub for discovering and sharing prompts across teams outside your organization.
- Your needs extend beyond observability into LangSmith's broader platform, which now includes agent deployment and Fleet management.
Conclusion
Langfuse and LangSmith both solve a real problem, and both have changed a lot over the past year. At this point, the trade-off is clear.
The decision is not about which platform has more features. It is the ownership and ecosystem trade-off from earlier. Do you need to control your data stack, or do you want less setup inside the LangChain/LangGraph world?
One caveat before you decide: both platforms change often. Check the changelogs before you commit.
For related background on the LangChain ecosystem, see our LangChain vs. LangGraph vs. LangSmith vs. LangFlow tutorial.
I’m a data engineer and community builder who works across data pipelines, cloud, and AI tooling while writing practical, high-impact tutorials for DataCamp and emerging developers.
FAQs
Can I switch from LangSmith to Langfuse later?
Yes, though it requires work. The OpenTelemetry point above helps with tracing portability. The harder part is exporting evaluation datasets before retention windows close.
Does Langfuse still support self-hosting now that ClickHouse owns it?
Yes. As I mentioned earlier, the MIT license and self-hosting were confirmed as unchanged at the time of the January 2026 acquisition. The practical caveat is operations.
Is LangSmith only for LangChain applications?
Not anymore. As mentioned above, LangSmith supports OpenTelemetry tracing through langsmith[otel]. The closest integration is still with LangGraph, but non-LangChain teams can use LangSmith too.
How does LangSmith's extended retention billing work?
As covered in the pricing section, LangSmith has two trace retention tiers: a 14-day base and a 400-day extended. Extended retention is triggered when feedback is added, a run rule fires, or a trace enters an annotation queue.
Is the Hobby tier on Langfuse good enough to evaluate the platform properly?
For individual developers, yes. The 50,000 unit monthly limit and 30-day retention are enough to connect an application and inspect real traces. For production evaluation, the self-hosting point above matters because the MIT version removes unit limits and user caps.


