Premium project

Book Recommendations from Charles Darwin

Build a book recommendation system using NLP and the text of books like "On the Origin of Species."

Start Project
12 Tasks1,500 XP

Loved by learners at thousands of companies


Project Description

Recommendation systems are at the heart of many products such as Netflix or Amazon. They generally rely on metadata (e.g., the actors or director of a movie) or on user tastes (e.g., the movies you liked before) to determine which you are most likely to enjoy. But when you are working with text-heavy datasets, you have access to a much richer resource—the whole text! In this project, you will learn how to build the basis of a book recommendation system based on their content. You will use Charles Darwin's bibliography to find out which books might interest you. The dataset was manually collected from [Project Gutenberg](https://www.gutenberg.org).

Project Tasks

  1. 1
    Darwin's bibliography
  2. 2
    Load the contents of each book into Python
  3. 3
    Find "On the Origin of Species"
  4. 4
    Tokenize the corpus
  5. 5
    Stemming of the tokenized corpus
  6. 6
    Building a bag-of-words model
  7. 7
    The most common words of a given book
  8. 8
    Build a tf-idf model
  9. 9
    The results of the tf-idf model
  10. 10
    Compute distance between texts
  11. 11
    The book most similar to "On the Origin of Species"
  12. 12
    Which books have similar content?
Technologies
Python Python
Topics
Data ManipulationData VisualizationProbability & StatisticsImporting & Cleaning Data
Philippe Julien Headshot

Philippe Julien

Senior Data Scientist at King
Philippe is a Senior Data Scientist at King, where he uses his analytical skills to improve games such as Candy Crush Saga. Before that, he worked for eight years as a researcher in computational biology studying how genomes evolve. In general, he is interested in the creative use of data in fields as diverse as science, gaming, sport, or tech in general.
See More

What do other learners have to say?

I've used other sites—Coursera, Udacity, things like that—but DataCamp's been the one that I've stuck with.

Devon Edwards Joseph
Lloyds Banking Group

DataCamp is the top resource I recommend for learning data science.

Louis Maiden
Harvard Business School

DataCamp is by far my favorite website to learn from.

Ronald Bowers
Decision Science Analytics, USAA