Blog

Snscrape Tutorial: How to Scrape Social Media with Python

This snscrape tutorial equips you to install, use, and troubleshoot snscrape. You'll learn to scrape Tweets, Facebook posts, Instagram hashtags, or Subreddits.

May 2024 · 8 min read

Snscrape is a powerful Python library that lets you scrape data from various social networking services (SNS) like Facebook, Twitter, Instagram, Reddit, and others.

This focus on social media allows snscrape to excel in areas where general web scraping tools might struggle. Social media platforms often have unique data structures and APIs that snscrape understands.

This translates to cleaner, more reliable data extraction compared to generic web scrapers that may need to work around these platform-specific features.

If you want to learn more about generic web scraping, check out these courses on web scraping in Python and web scraping in R.

Snscrape in a Nutshell

Here's a glimpse into what we can scrape with snscrape:

User profiles: Gather public profile information across various platforms, including bios, follower counts, and post history.
Posts and content: Scrape tweets, Instagram posts, Reddit submissions, and more, depending on the platform. You can even target specific hashtags, locations, or searches to focus your data collection.
Social groups and communities: Delve into Facebook groups, Reddit subreddits, or Telegram channels to analyze group discussions and interactions.

This is a summary of the specific data types supported for each platform (based on snscrape's official documentation):

Platform	Supported Data
Facebook	User profiles, Groups, Communities
Instagram	User profiles, Hashtags, Locations
Mastodon	User profiles, Toots (single or thread)
Reddit	Users, Subreddits, Searches
Telegram	Channels
Twitter	Users, User profiles, Hashtags, Searches, Tweets (single or thread), List posts, Communities, Trends
VKontakte	User profiles
Weibo (Sina Weibo)	User profiles

How to Install Snscrape

To get started with snscrape, we'll first need to install it using pip or conda:

$ pip install snscrape

If you use conda, simply replace the word pip with the word conda in the above code snippet.

Snscrape requires Python 3.8 or higher, and you might need to install the libxml2 and libxslt libraries as well.

Once installed, we're ready to start scraping social media data.

How to Use Snscrape

One of snscrape's strengths is its command-line interface (CLI), which offers simplicity and efficiency in fetching social media data.

Let's start with an example of scraping Facebook data. Suppose we want to collect Facebook posts from a specific page.

You can use snscrape's command-line interface (CLI) to achieve this:

$ snscrape facebook-page "page_url" > posts.txt

In this example, we can replace “page_url” with the URL of the Facebook page we want to scrape. The result will be saved in a file named posts.txt.

If you’re anything like me, you might prefer to use scripts for complex or repeated tasks. Fortunately, you can easily use snscrape within a Python script:

import os

# Define the Facebook page URL
page_url = "your_page_url_here"

# Execute the snscrape command to retrieve posts
os.system(f"snscrape facebook-page {page_url} > posts.txt")

# Posts are saved in the 'posts.txt' file
print(f"Posts from '{page_url}' have been saved to 'posts.txt'.")

Keep in mind that many social media pages, including Facebook, generally discourage unauthorized scraping and actively work against it. You can read more about Facebook’s thoughts on this subject in this article.

Snscrape: Advanced Techniques

Snscrape offers us advanced features for fine-tuning the data collection process. For example, we can specify the number of results to scrape, filter by date range, or target specific users or hashtags.

Here's an example of scraping posts within a specific date range:

$ snscrape facebook-page "page_url" --since 2023-01-01 --until 2023-12-31 > posts.txt

We can also scrape data from other platforms like Reddit using similar commands tailored to each platform's syntax. As we mentioned in a previous section, snscrape currently supports scraping from:

Facebook
Instagram
Mastodon
Reddit
Telegram
Twitter
VKontakte
Weibo (Sina Weibo)

Global Options

Snscrape has a variety of global options we can use to customize our scraping.

For instance, if we need JSON Lines instead of .txt files, we can use the jsonl global option to save our scraping results in that file type:

$ snscrape facebook-page "page_url" --jsonl > posts.jsonl

Another example is limiting the number of results we collect using the max-results option. This is particularly useful if we have limited computer resources or the page we’re scraping has a large volume of data:

$ snscrape facebook-page "page_url" --max-results 50 > limited_posts.txt

We can also scrape extra information related to each post, such as the user profiles and hashtags, by collecting the entity information:

$ snscrape facebook-page "page_url" --with-entity > posts_with_entity.txt

Use Cases for Snscrape

Snscrape's ability to extract data from social media platforms opens doors to many applications across various domains. Let’s explore some compelling use cases for researchers and businesses.

Research and Academia

Scraping social network services has the following use cases for research and Academia:

Social listening and sentiment analysis: Snscrape empowers researchers to analyze public opinion and gauge sentiment surrounding specific events—from global conferences and political debates to natural disasters. This data can be used to understand public perception, identify emerging trends, and inform decision-making. The extracted information can also be valuable for stock traders in predicting market reactions to such events.

Network analysis and community detection: By scraping social connections and interactions, researchers can map social networks, identify influential users, and understand how information flows within communities. This knowledge can be useful for studying online movements, social influence, and the spread of information.

Businesses and marketing

Scraping social media data can help businesses with:

Brand monitoring and reputation management: Businesses can leverage snscrape to track online mentions of their brand and monitor customer sentiment. This allows them to identify potential crises, address customer concerns promptly, and measure the effectiveness of their marketing campaigns.
Competitor analysis and market research: By scraping data from competitor profiles and industry forums, businesses can gain valuable insights into competitor strategies, customer preferences, and emerging trends within their market. This knowledge can be used to refine marketing strategies, develop competitive advantages, and optimize product offerings.

Ethical Considerations

Upholding user privacy and maintaining ethical standards is essential when scraping data.

Before starting any scraping project, read the platform’s terms of service and have a plan for protecting users’ data privacy. You are responsible for ensuring that your data collection activities comply with the platforms' usage policies and guidelines, such as rate limits, data access restrictions, and content usage permissions. By adhering to these terms of service, you can avoid potential legal implications and uphold ethical standards in data collection practices.

Ethical considerations extend to the responsible handling and storage of scraped data. To safeguard sensitive information, you should implement robust data management practices, including encryption, anonymization, and secure storage protocols. Once you have the data, it is your responsibility to make sure it’s not misused.

Snscrape: Issues and Troubleshooting

While snscrape offers robust functionality for scraping social media data, we may encounter different issues.

Authentication errors

We might encounter errors related to invalid usernames, passwords, or API keys (depending on the platform).

We need to double-check our credentials for typos or expired tokens. We can also refer to the official documentation for specific authentication requirements for each platform we're scraping.

Rate limiting

Social media platforms often have rate limits to prevent excessive scraping. We may see error messages indicating we've exceeded the allowed requests per timeframe.

We need to be mindful of rate limits and adjust the scraping speed accordingly. Snscrape offers options like --wait to introduce delays between requests. We can also consider scraping data in smaller batches spread over time.

Data parsing errors

Unexpected changes in the platform's structure or layout can lead to parsing errors, where snscrape struggles to interpret the extracted data.

We need to stay updated with the latest snscrape releases, as developers often address these issues with platform updates. We can consult the GitHub repository for reported issues and potential workarounds.

Access denied

In some cases, the platform might block scraping attempts entirely.

We need to respect the terms of service of each platform and avoid scraping excessively or targeting sensitive data. If scraping is strictly prohibited, we need to consider alternative data sources or adjust our research approach.

Troubleshooting

Here are some great tips for smoother scraping:

Start small: Begin with small scraping tasks to test your commands and identify any potential issues before attempting larger data collections.
Read the documentation: The official snscrape documentation offers valuable insights into scraper-specific options and best practices. Refer to it frequently to troubleshoot and optimize your scraping experience.
Join the community: The snscrape community on GitHub is a great resource for finding solutions to common problems and learning from other users' experiences.

Conclusion

In this tutorial, we covered the fundamentals of using snscrape to extract data from various social networking services. We learned to install and use snscrape through CLI and Python, and we explored use cases, ethical considerations, and troubleshooting techniques.

You can continue your learning journey by getting more in-depth with topics like sentiment analysis, data ethics, or social media data analysis: