Skip to main content

DeepSeek R1: Features, o1 Comparison, Distilled Models & More

Learn about DeepSeek-R1's key features, development process, distilled models, how to access it, pricing, and how it compares to OpenAI o1.
Updated Jan 31, 2025  · 8 min read

DeepSeek just announced DeepSeek-R1, the next step in their work on reasoning models. It’s an upgrade from their earlier DeepSeek-R1-Lite-Preview and shows they’re serious about competing with OpenAI’s o1.

With OpenAI planning to release o3 later this year, it’s clear that competition in reasoning models is ramping up. Even though DeepSeek may lag slightly behind in some areas, its open-source nature and significantly lower pricing make it a compelling option for the AI community.

In this blog, I’ll break down DeepSeek-R1’s key features, development process, distilled models, how to access it, pricing, and how it compares to OpenAI’s models.

I originally wrote this article on the day DeepSeek-R1 was released, but I’ve now updated it with a new section covering its aftermath—how it impacted the stock market, the economics of AI (including Jevons’ paradox and the commoditization of AI models), and OpenAI’s accusation that DeepSeek distilled their models.

What Is DeepSeek-R1?

DeepSeek-R1 is an open-source reasoning model developed by DeepSeek, a Chinese AI company, to address tasks requiring logical inference, mathematical problem-solving, and real-time decision-making.

What sets reasoning models like DeepSeek-R1 and OpenAI’s o1 apart from traditional language models is their ability to show how they arrived at a conclusion.

example of reasoning being shown using deepseek r1 deepthink

With DeepSeek-R1, you can follow its logic, making it easier to understand and, if necessary, challenge its output. This capability gives reasoning models an edge in fields where outcomes need to be explainable, like research or complex decision-making.

What makes DeepSeek-R1 particularly competitive and attractive is its open-source nature. Unlike proprietary models, its open-source nature allows developers and researchers to explore, modify, and deploy it within certain technical limits, such as resource requirements.

How Was DeepSeek-R1 Developed?

In this section, I’ll walk you through how DeepSeek-R1 was developed, starting with its predecessor, DeepSeek-R1-Zero.

DeepSeek-R1-Zero

DeepSeek-R1 began with R1-Zero, a model trained entirely through reinforcement learning. While this approach allowed it to develop strong reasoning capabilities, it came with major drawbacks. The outputs were often difficult to read, and the model sometimes mixed languages within its responses. These limitations made R1-Zero less practical for real-world applications.

Challenges of pure reinforcement learning

The reliance on pure reinforcement learning created outputs that were logically sound but poorly structured. Without the guidance of supervised data, the model struggled to communicate its reasoning effectively. This was a barrier for users who needed clarity and precision in the results.

Improvements with DeepSeek-R1

To address these issues, DeepSeek made a change in R1’s development by combining reinforcement learning with supervised fine-tuning. This hybrid approach incorporated curated datasets, improving the model’s readability and coherence. Problems like language mixing and fragmented reasoning were significantly reduced, making the model more suitable for practical use.

If you want to learn more about the development of DeepSeek-R1, I recommend reading the release paper.

Distilled Models of DeepSeek-R1

Distillation in AI is the process of creating smaller, more efficient models from larger ones, preserving much of their reasoning power while reducing computational demands. DeepSeek applied this technique to create a suite of distilled models from R1, using Qwen and Llama architectures.

Source: DeepSeek’s release paper

Qwen-based distilled models

DeepSeek’s Qwen-based distilled models focus on efficiency and scalability, offering a balance between performance and computational requirements.

DeepSeek-R1-Distill-Qwen-1.5B

This is the smallest distilled model, achieving 83.9% on MATH-500. MATH-500 tests the ability to solve high-school-level mathematical problems with logical reasoning and multi-step solutions. This result shows that the model handles basic mathematical tasks well despite its compact size.

However, its performance drops significantly on LiveCodeBench (16.9%), a benchmark designed to assess coding abilities, highlighting its limited capability in programming tasks.

DeepSeek-R1-Distill-Qwen-7B

Qwen-7B shines on MATH-500, scoring 92.8%, which demonstrates its strong mathematical reasoning capabilities. It also performs reasonably well on GPQA Diamond (49.1%), which evaluates factual question answering, indicating it has a good balance between mathematical and factual reasoning.

However, its performance on LiveCodeBench (37.6%) and CodeForces (1189 rating) suggests it is less suited for complex coding tasks.

DeepSeek-R1-Distill-Qwen-14B

This model performs well on MATH-500 (93.9%), reflecting its ability to handle complex mathematical problems. Its 59.1% score on GPQA Diamond also indicates competence in factual reasoning.

Its performance on LiveCodeBench (53.1%) and CodeForces (1481 rating) shows room for growth in coding and programming-specific reasoning tasks.

DeepSeek-R1-Distill-Qwen-32B

The largest Qwen-based model achieves the highest score among its peers on AIME 2024 (72.6%), which evaluates advanced multi-step mathematical reasoning. It also excels on MATH-500 (94.3%) and GPQA Diamond (62.1%), demonstrating its strength in both mathematical and factual reasoning.

Its results on LiveCodeBench (57.2%) and CodeForces (1691 rating) suggest it is versatile but still not optimized for programming tasks compared to models specialized in coding.

Llama-based distilled models

DeepSeek’s Llama-based distilled models prioritize high performance and advanced reasoning capabilities, particularly excelling in tasks requiring mathematical and factual precision.

DeepSeek-R1-Distill-Llama-8B

Llama-8B performs well on MATH-500 (89.1%) and reasonably on GPQA Diamond (49.0%), indicating its ability to handle mathematical and factual reasoning. However, it scores lower on coding benchmarks like LiveCodeBench (39.6%) and CodeForces (1205 rating), which highlights its limitations in programming-related tasks compared to Qwen-based models.

DeepSeek-R1-Distill-Llama-70B

The largest distilled model, Llama-70B, delivers top-tier performance on MATH-500 (94.5%), the best among all distilled models, and achieves a strong score of 86.7% on AIME 2024, making it an excellent choice for advanced mathematical reasoning.

It also performs well on LiveCodeBench (57.5%) and CodeForces (1633 rating), suggesting it is more competent in coding tasks than most other models. In this domain, it’s on par with OpenAI’s o1-mini or GPT-4o.

How to Access DeepSeek-R1

You can access DeepSeek-R1 through two primary methods: the web-based DeepSeek Chat platform and the DeepSeek API, allowing you to choose the option that best fits your needs.

Web access: DeepSeek Chat platform

The DeepSeek Chat platform offers a straightforward way to interact with DeepSeek-R1. To access it, you can either go directly to the chat page or click Start Now on the home page.

Deepseek home page

After registering, you can select the “Deep Think” mode to experience Deepseek-R1’s step-by-step reasoning capabilities.

deepseek chat interface showing the deepthink option to activate deepseek-r1

API access: DeepSeek’s API

For integrating DeepSeek-R1 into your applications, the DeepSeek API provides programmatic access.

To get started, you’ll need to obtain an API key by registering on the DeepSeek Platform.

The API is compatible with OpenAI’s format, making integration straightforward if you’re familiar with OpenAI’s tools. You can find more instructions on DeepSeek’s API documentation.

DeepSeek-R1 Pricing

As of January 21, 2025, the chat platform is free to use, but with a daily cap of 50 messages in “Deep Think” mode. This limitation makes it ideal for light usage or exploration.

The API offers two models—deepseek-chat (DeepSeek-V3) and deepseek-reasoner (DeepSeek-R1)—with the following pricing structure (per 1M tokens):

MODEL

CONTEXT LENGTH

MAX COT TOKENS

MAX OUTPUT TOKENS

1M TOKENS

INPUT PRICE

(CACHE HIT)

1M TOKENS

INPUT PRICE

(CACHE MISS)

1M TOKENS

OUTPUT PRICE

deepseek-chat

64K

-

8K

$0.07

$0.014

$0.27

$0.14

$1.10

$0.28

deepseek-reasoner

64K

32K

8K

$0.14

$0.55

$2.19

Source: DeepSeek’s pricing page

To ensure you have the most up-to-date pricing information and understand how to calculate the cost of CoT (Chain-of-Thought) reasoning, visit DeepSeek’s pricing page.

DeepSeek-R1 vs. OpenAI O1: Benchmark Performance

DeepSeek-R1 competes directly with OpenAI o1 across several benchmarks, often matching or surpassing OpenAI’s o1.

Source: DeepSeek’s release paper

Mathematics benchmarks: AIME 2024 and MATH-500

In mathematics benchmarks, DeepSeek-R1 demonstrates strong performance. On AIME 2024, which evaluates advanced multi-step mathematical reasoning, DeepSeek-R1 scores 79.8%, slightly ahead of OpenAI o1-1217 at 79.2%.

On MATH-500, DeepSeek-R1 takes the lead with an impressive 97.3%, slightly surpassing OpenAI o1-1217 at 96.4%. This benchmark tests models on diverse high-school-level mathematical problems requiring detailed reasoning.

Coding benchmarks: Codeforces & SWE-bench Verified

The Codeforces benchmark evaluates a model’s coding and algorithmic reasoning capabilities, represented as a percentile ranking against human participants. OpenAI o1-1217 leads with 96.6%, while DeepSeek-R1 achieves a very competitive 96.3%, with only a minor difference.

The SWE-bench Verified benchmark evaluates reasoning in software engineering tasks. DeepSeek-R1 performs strongly with a score of 49.2%, slightly ahead of OpenAI o1-1217’s 48.9%. This result positions DeepSeek-R1 as a strong contender in specialized reasoning tasks like software verification.

General knowledge benchmarks: GPQA Diamond and MMLU

For factual reasoning, GPQA Diamond measures the ability to answer general-purpose knowledge questions. DeepSeek-R1 scores 71.5%, trailing OpenAI o1-1217, which achieves 75.7%. This result highlights OpenAI o1-1217’s slight advantage in factual reasoning tasks.

On MMLU, a benchmark that spans various disciplines and evaluates multitask language understanding, OpenAI o1-1217 slightly edges out DeepSeek-R1, scoring 91.8% compared to DeepSeek-R1’s 90.8%.

DeepSeek’s Aftermath

The release of DeepSeek-R1 has had far-reaching consequences, affecting stock markets, reshaping AI economics, and sparking controversy over model development practices.

Impact on the stock market

DeepSeek’s introduction of its R1 model, which offers advanced AI capabilities at a fraction of the cost of competitors, led to a substantial decline in the stock prices of major U.S. tech companies.

Nvidia, for instance, experienced a nearly 18% drop in its stock value, equating to a loss of approximately $600 billion in market capitalization. This decline was driven by investor concerns that DeepSeek’s efficient AI models could reduce the demand for high-performance hardware traditionally supplied by companies like Nvidia.

Jevons’ paradox and the commoditization of AI Models

Open-weight models like DeepSeek-R1 are lowering costs and forcing AI companies to rethink their pricing strategies. This is evident in the stark pricing contrast:

  • OpenAI’s o1 costs $60 per million output tokens
  • DeepSeek-R1 costs $2.19 per million output tokens

Some industry leaders have pointed to Jevons’ paradox—the idea that as efficiency increases, overall consumption can rise rather than fall. Microsoft CEO Satya Nadella hinted at this, arguing that as AI gets cheaper, demand will explode.

However, I liked this balanced view from The Economist, which argues that a full Jevons effect is very rare and depends on whether price is the main barrier to adoption. With only “5% of American firms currently using AI and 7% planning to adopt it”, Jevons’ effect will probably be low. Many businesses still find AI integration difficult or unnecessary.

OpenAI accused DeepSeek of distillation

Alongside its disruptive impact, DeepSeek has also found itself at the center of controversy. OpenAI has accused DeepSeek of distilling its models—essentially extracting knowledge from OpenAI’s proprietary systems and replicating their performance in a more compact, efficient model.

So far, OpenAI has provided no direct evidence for this claim, and to many, the accusation looks more like a strategic move to reassure investors amid the shifting AI landscape.

Conclusion

DeepSeek-R1 is a strong competitor in reasoning-focused AI, performing on par with OpenAI’s o1. While OpenAI’s o1 might have a slight edge in coding and factual reasoning, I think DeepSeek-R1’s open-source nature and cost-efficient access make it an appealing option.

As OpenAI prepares to release o3, I’m looking forward to seeing how this growing competition shapes the future of reasoning models. For now, DeepSeek-R1 is a compelling alternative.

FAQs

How does DeepSeek-R1 handle multilingual queries?

DeepSeek-R1 is optimized for English and Chinese, but its performance may degrade for queries in other languages. Some outputs may mix English and Chinese, particularly when handling reasoning tasks. Future updates are expected to address this limitation.

Can DeepSeek-R1 be fine-tuned for specific tasks or industries?

Yes, as an open-source model, DeepSeek-R1 can be fine-tuned for specific tasks, provided you have the necessary computational resources and data. This flexibility makes it particularly appealing for researchers and organizations needing domain-specific applications.

Are there limits to how long the outputs of DeepSeek-R1 can be?

Yes, the output token limits for DeepSeek-R1 vary depending on the access method. For example, the API’s deepseek-reasoner model supports a maximum output length of 8,000 tokens, which includes reasoning steps (Chain-of-Thought) and the final answer.

What kind of hardware is needed to run DeepSeek-R1 locally?

Running DeepSeek-R1 or its distilled models locally requires high-performance GPUs or TPUs, especially for the larger models like DeepSeek-R1-Distill-Llama-70B. Smaller distilled versions, such as Qwen-1.5B, are more feasible for systems with limited resources.

How does context caching work in DeepSeek's API, and how much can it save?

Context caching stores repeated input tokens to reduce costs. For example, if you reuse inputs in multi-turn conversations, the system retrieves these tokens from the cache at a significantly lower price. This feature is particularly beneficial for workflows with repetitive queries.


Alex Olteanu's photo
Author
Alex Olteanu
LinkedIn

Jack of all trades, master of Python, content, SEO, editing, writing. Technical guy—I wrote courses on Python, statistics, probability. But I also published an award-winning novel. Video editing & color grading in DaVinci.

Topics

Learn AI with these courses!

course

Working with the OpenAI API

3 hr
33.8K
Start your journey developing AI-powered applications with the OpenAI API. Learn about the functionality that underpins popular AI applications like ChatGPT.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

blog

DeepSeek vs. OpenAI: Comparing the New AI Titans

Exploring the strengths, weaknesses, cost efficiencies, and safety protocols of DeepSeek-R1 and OpenAI’s o1 models.
Vinod Chugani's photo

Vinod Chugani

7 min

blog

I Tested DeepSeek R1 Lite Preview to See if It's Better Than O1

I tested the capabilities of the new DeepSeek-R1-Lite-Preview (DeepThink) model through a series of math, coding, and logic tasks.
Dr Ana Rojo-Echeburúa's photo

Dr Ana Rojo-Echeburúa

8 min

blog

DeepSeek's Janus Pro: Features, DALL-E 3 Comparison & More

Learn about DeepSeek's new multimodal AI model, Janus-Pro, how to access it, and how it compares to OpenAI's DALL-E 3.
Alex Olteanu's photo

Alex Olteanu

8 min

blog

DeepSeek vs. ChatGPT: How Do They Compare?

DeepSeek and ChatGPT are two leading AI chatbots, each with unique strengths. Learn how they compare in performance, cost, accuracy, and applications to decide which one suits your needs best.
Vinod Chugani's photo

Vinod Chugani

9 min

tutorial

DeepSeek V3: A Guide With Demo Project

Learn how to build an AI-powered code reviewer assistant using DeepSeek-V3 and Gradio.
Aashi Dutt's photo

Aashi Dutt

8 min

tutorial

How to Set Up and Run DeepSeek R1 Locally With Ollama

Learn how to install, set up, and run DeepSeek-R1 locally with Ollama and build a simple RAG application.
Aashi Dutt's photo

Aashi Dutt

12 min

See MoreSee More