Project Description
In this Project, you'll scrape a novel from the website Project Gutenberg (which contains a large corpus of books) using the Python package `requests`. Then you'll extract words from this web data using `BeautifulSoup`. Finally, we'll dive into analyzing the distribution of words using the Natural Language ToolKit (`nltk`). The natural language processing tools used here apply to much of the data that data scientists encounter as a vast proportion of the world's data is unstructured data and includes a great deal of text. To complete this Project, you need to know how to import web data into Python and how to work with natural language text.
Project Tasks
- 1# Introduction
Technologies
Python
Hadrien Lacroix
Curriculum Manager at DataCamp
Hadrien has collaborated on 30+ courses ranging from machine learning to database administration through data engineering. He's currently enrolled in a Masters of Analytics at Georgia Tech.
Hadrien started using DataCamp when the platform only had 27 courses. He then joined the Support team and helped students before becoming a Content Developer himself.
Follow Hadrien on LinkedIn
Hadrien started using DataCamp when the platform only had 27 courses. He then joined the Support team and helped students before becoming a Content Developer himself.
Follow Hadrien on LinkedIn