Skip to main content

Firecrawl: AI Web Crawler Built for LLM Applications

Discover how Firecrawl simplifies web data extraction with AI-driven crawling, scraping, and mapping tools. Learn how to use it with Python and AI agents to power modern workflows.
Jul 3, 2025  · 8 min read

Firecrawl is a novel kind of web crawler optimized for AI workflows. It allows developers to scrape pages, websites, or even the whole web. It is well-suited to remove complexity from traditional web scraping with features like JavaScript support, automatic markdown conversion, and integration with popular LLM frameworks.

In this article, I will not only describe the main functionality of Firecrawl, but I will also cite some functional advanced tips, ecosystem integration, and mention some real-world applications, so you can start practicing for yourself.

What Is Firecrawl?

Firecrawl is an AI-powered web crawler developed by Mendable.ai. It is an API service that crawls websites and converts them into clean LLM-ready data like markdown, JSON, etc.

It is an AI-driven approach that understands page context and extracts the main content intelligibly, unlike traditional scrapers like BeautifulSoup or Puppeteer, which crawl the web blindly and often return unclean data. It is capable of turning entire websites into clean markdown or structured data, and is ideal for LLM tasks.

Firecrawl has 3 core modes, which I'll discuss in detail in a bit. You generally find yourself using scrape mode to scrape a single URL, crawl mode to scrape an entire website, and map mode for URL discovery.

Key Features of Firecrawl

What makes Firecrawl stand out is that it requires no sitemap since it has an intelligent navigation mechanism, it handles heavy JavaScript and dynamic web content, its output is a clean markdown, HTML, JSON, screenshots, and more, making it very flexible in use, it has a built-in proxy, anti-bot, and caching mechanisms, it handles batch processing and is concurrent for large-scale jobs, it is extremely customizable as you can exclude tags, crawl with custom headers, set crawl depth, and more, it integrates well with LLM frameworks like LangChain, LlamaIndex, and CrewAI, and it is of course enterprise friendly since you can set your rate-limit, concurrency controls and reliability.

How to Set Up Firecrawl

The fun part about Firecrawl is that you can set it to be working very quickly, here are the main steps to follow to install it in your system:

API key configuration

  1. Sign up at firecrawl.dev to obtain an API key.
  2. Install the Python client library:
pip install firecrawl

And if you are using langchain, use

pip install firecrawl-py
  1. Securely store the API key using environment variables:
import os
from firecrawl import FirecrawlApp

# Set your Firecrawl API key as an environment variable for security
api_key = os.getenv('FIRECRAWL_API_KEY')
# Create an instance of the FirecrawlApp class with your API key 
app = FirecrawlApp(api_key=api_key)

Basic scraping example

You can simply make your first scraping using the .scrape_url() method:

response = app.scrape_url(url=example.com', formats=['markdown'])
print(response)

This code retrieves the main content of the firecrawl.dev webpage as a markdown.

Understanding Firecrawl Modes and Endpoints

Firecrawl has three core modes that define how broadly you want to scrape. In this section, I will walk you through each one.

Scrape mode

Scrape mode targets individual URLs. It is ideal for extracting product details or news articles. The llm_extract parameter enables schema-based extraction. What you should do is define the schema you want your final JSON to look like, initiate your configuration, and start scraping:

from firecrawl import JsonConfig
from pydantic import BaseModel

# Define the expected data structure for the JSON extraction schema
class ExtractSchema(BaseModel):
    company_mission: str
    supports_sso: bool
    is_open_source: bool
    is_in_yc: bool

# Define the JSON configuration and set it to llm extraction mode
json_config = JsonConfig(
    extractionSchema=ExtractSchema.model_json_schema(),
    mode="llm-extraction",
    pageOptions={"onlyMainContent": True}
)

# Scrape the URL, extract data, and format it into JSON
llm_extraction_result = app.scrape_url(
    'https://firecrawl.dev',
    formats=["json"],
    json_options=json_config
)
print(llm_extraction_result)

Here’s another type of single-URL scraping, but this time, with selector-based extraction:

# Get a screenshot of the top of the overview page
scrape_result = app.scrape_url('firecrawl.dev', 
    formats=['markdown', 'html', "screenshot"], 
    actions=[
        {"type": "wait", "milliseconds": 3000},
        {"type": "click", "selector": "h1"},
        {"type": "wait", "milliseconds": 3000},
        {"type": "scrape"},
        {"type": "screenshot"}
    ],
)
print(scrape_result)

This script first opens the website, then waits for 3000 milliseconds, clicks on the first h1 selector, waits again, and then scrapes and saves a screenshot of the current page. If you get timeout errors, try to increase the waiting times alongside the timeout parameter.

Crawl mode

The crawl mode allows you to crawl an entire website, including all accessible subpages, without requiring a sitemap. It returns a job ID to track progress and supports metadata retrieval when you want to grab extra info like headers and timestamps, rate control to throttle how fast you hit the site, and recursive depth settings to decide how many link levels to follow. The steps of a crawler are as follows:

  1. First, the crawler opens the given URL
  2. It scrapes it
  3. All the hyperlinks linking to subpages are scanned and are given separate crawlers
  4. The steps are repeated until a defined maximum is reached.

Here is an example of a crawler using Python requests:

import requests

url = "https://api.firecrawl.dev/v1/crawl" 
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer YOUR_API_KEY"
}
data = {
    "url": "https://docs.firecrawl.dev", 	# site to crawl
    "limit": 100, 		# max items
    "scrapeOptions": {
        "formats": ["markdown", "html"]  	# output formats
    }
}

response = requests.post(url, headers=headers, json=data)

print(response.status_code)
print(response.json())

This code tells Firecrawl to fetch up to 100 pages from https://docs.firecrawl.dev, return each page in both Markdown and HTML, and then prints out the result.

If you want to learn more about APIs, check out our Introduction to APIs in Python course, Working with APIs in Python code-along, and our Mastering Python APIs: A Comprehensive Guide to Building and Using APIs in Python tutorial.

Map mode

Map lets you input a website and instantly retrieve every URL on the site extremely fast. That’s why we created /map: It transforms a single URL into a comprehensive sitemap, making it ideal when you need to prompt end-users to select links to scrape, quickly uncover all site URLs, focus on pages related to a specific topic using the search parameter, or restrict your crawl to particular pages.

A consideration: Because this endpoint prioritizes speed, it may not capture every link.

# Map a website:
map_result = app.map_url('https://firecrawl.dev')
print(map_result)

This code tells Firecrawl to discover all of the links in the https://firecrawl.dev page and list them to you.

Real-World Applications of Firecrawl

Firecrawl is a versatile tool for a variety of tasks, as we have seen earlier, which makes it extremely handy in real-world scenarios. It is very applicable when scraping job boards or news websites to compile up-to-date listings or articles for analysis, or to train your forecasting machine learning model. You can also perform sentiment analysis via scraped reviews, like collecting customer reviews from e-commerce or review sites like Amazon, to analyze sentiment and inform business decisions.

This can be further enhanced with price monitoring, like tracking product prices across multiple platforms to identify trends or trigger alerts for price drops. Furthermore, you can use Firecrawl to extract technical documentation to train or inform AI agents, improving their knowledge base, or data ingestion into multi-agent AI pipelines, like how CAMEL-AI does it.

It can also provide structured web data to retrieval-augmented generation (RAG) systems to enhance the model performance and decrease its hallucination levels

Advanced Techniques and Best Practices

To maximize Firecrawl’s potential, I state some of the advanced techniques and best practices:

  • Extract structured data for your LLM by using the /extract endpoint to return structured data based on a prompt or a schema, since LLMs better handle this type of data than raw HTML.

  • Handle your errors with retries by implementing retry logic with exponential backoff to manage transient API errors.

  • Optimize the performance by using Firecrawl’s asynchronous crawling to start large crawls without blocking your application. This is ideal for web apps or services.

  • Filter crawls with parameters like max_depth or exclude to limit crawls to specific subdomains or page types. This will help reduce unnecessary data as needed.

  • Use Python tools to validate and clean the crawled data, like balancing it, deleting null values, and many others. pandas, numpy, seaborn, and others help well in such tasks. If you want to learn more about data cleaning, consider our course on Cleaning Data in Python.

Firecrawl vs. Traditional Scraping Tools

Firecrawl has numerous advantages that make it one of the best choices in most cases, and make it stand out over traditional scraping tools like Scrapy, BeautifulSoup, and Puppeteer. It handles JavaScript-rendered content effortlessly, while tools like BeautifulSoup require additional tools like Selenium and sometimes manual inspection. It has a built-in proxy rotation and handles rate limits to reduce setup time compared to Scrapy’s manual configurations, which takes a lot of time in itself. Besides, Firecrawl’s LLM extraction capabilities provide structured data, which makes it a feature absent in most traditional tools.

However, traditional tools may be preferable in specific cases, like when you want a local execution, which makes tools like BeautifulSoup run locally without API dependencies, appealing to users who avoid cloud services. Ease and efficiency of implementation sometimes come with the cost of extreme customization, like how Scrapy offers fine-grained control for highlight customizable scraping pipelines.

Integrations and Ecosystem Support

Firecrawl integrates well with many of the common LLM frameworks, with robust ecosystem support, which enhances its utility in AI workflows. You can integrate with LangChain using the FirecrawlLoader module, which enables easy integration of web data into LangChain pipelines. Or, you can use it with LlamaIndex since the FirecrawlReader supports loading web data for indexing and querying. Firecrawl is also a built-in tool for crawling websites within CrewAI’s framework and supports many other frameworks like Flowise, Dify, and CAMEL-AI with broad compatibility.

Pricing Plans and Usage Limits

Firecrawl offers a range of pricing plans depending on the use case to accommodate different needs. Make sure to refer to the Firecrawl official pricing page, in case things change.

  • Free tier: It includes a limited number of credits and is suitable for testing and small projects.
  • Hobby tier: It offers more credits for individual or small teams for a low cost.
  • Standard, Growth, Enterprise tiers: It provides increasing credits, higher rate limits, and advanced features like custom requests per minute (RPM).

The credit system works as follows:

Plan

Credits

Features

Best For

Free

Limited

Basic scraping, testing

Beginners, small projects

Hobby

Moderate

More credits, standard features

Individual developers

Standards

High

Higher rate limits, advanced features

Growing teams

Growth

Very High

Scalable, custom options

Large projects

Enterprise

Unlimited

Custom RPM, dedicated support

High-volume, enterprise use

Conclusion

Firecrawl is a game-changer for web data extraction, particularly for AI applications. By converting websites into clean, structured, LLM-ready data, it helps developers to build smarter. Its ease of use, rich feature set, and extensive integrations make it a top choice for tasks ranging from price monitoring in RAG workflows.

I encourage you to explore Firecrawl’s free tier to test its capabilities. Stay engaged with the community on Firecrawl GitHub and Discord for updates, tips, and support.

With Firecrawl, web data is no longer a challenge. It’s a powerful asset for your AI innovations.

As a next step, I recommend the following course and code-along:


Iheb Gafsi's photo
Author
Iheb Gafsi
LinkedIn

I work on accelerated AI systems enabling edge intelligence with federated ML pipelines on decentralized data and distributed workloads.  Mywork focuses on Large Models, Speech Processing, Computer Vision, Reinforcement Learning, and advanced ML Topologies.

FAQs

Why am I getting a timeout error?

You should first check that you use your API key correctly, then debug if the website correctly finds the parts you are scraping.

Does Firecrawl only use AI-powered search?

No, you can use different kinds of search, like selector-based and LLM-based search.

Can Firecrawl handle websites with heavy JavaScript content?

Yes, Firecrawl is optimized even in dynamic websites.

When might I choose a traditional tool like BeautifulSoup over FireCrawl?

BeautifulSoup is local, free, and handles simple HTML scraping.

Can I test Firecrawl for free?

Yes, you have a limited requests per minute as a tester.

Topics

Learn with DataCamp

Track

Python Data Fundamentals

0 min
Grow your data skills, discover how to manipulate and visualize data, and apply advanced analytics to make data-driven decisions.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

blog

Web Scraping Projects: Ideas for All Skill Levels

Discover a range of web scraping projects that offer practical applications, from beginner-friendly ideas to advanced techniques, using Python and popular scraping tools.
Allan Ouko's photo

Allan Ouko

10 min

Tutorial

Making Web Crawlers Using Scrapy for Python

Develop web crawlers with Scrapy, a powerful framework for extracting, processing, and storing web data.
Hafsa Jabeen's photo

Hafsa Jabeen

14 min

Tutorial

Web Scraping & NLP in Python

Learn to scrape novels from the web and plot word frequency distributions; You will gain experience with Python packages requests, BeautifulSoup and nltk.
Hugo Bowne-Anderson's photo

Hugo Bowne-Anderson

14 min

Tutorial

ScrapeGraphAI Tutorial: Getting Started With AI Web Scraping

ScrapeGraph AI is an open-source tool that simplifies web scraping by automatically extracting structured data from websites, allowing users to interact with and retrieve the data through simple prompts.
Ryan Ong's photo

Ryan Ong

7 min

Tutorial

Machine Learning with Python & Snowflake Cortex AI: A Guide

Learn about Snowflake Cortex AI and how it can be used for LLMs and machine learning.
Austin Chia's photo

Austin Chia

Tutorial

Langflow: A Guide With Demo Project

Learn what Langflow is, how to install it, and how to build simple and custom AI agent workflows using Python.
François Aubry's photo

François Aubry

12 min

See MoreSee More