Skip to main content
HomePythonBook Recommendations from Charles Darwin
Premium project

Book Recommendations from Charles Darwin

Build a book recommendation system using NLP and the text of books like "On the Origin of Species."

Start Project
12 Tasks1,500 XP

Loved by learners at thousands of companies

Project Description

Recommendation systems are at the heart of many products such as Netflix or Amazon. They generally rely on metadata (e.g., the actors or director of a movie) or on user tastes (e.g., the movies you liked before) to determine which you are most likely to enjoy. But when you are working with text-heavy datasets, you have access to a much richer resource—the whole text! In this project, you will learn how to build the basis of a book recommendation system based on their content. You will use Charles Darwin's bibliography to find out which books might interest you.

The dataset was manually collected from Project Gutenberg.

Project Tasks

  1. 1
    Darwin's bibliography
  2. 2
    Load the contents of each book into Python
  3. 3
    Find "On the Origin of Species"
  4. 4
    Tokenize the corpus
  5. 5
    Stemming of the tokenized corpus
  6. 6
    Building a bag-of-words model
  7. 7
    The most common words of a given book
  8. 8
    Build a tf-idf model
  9. 9
    The results of the tf-idf model
  10. 10
    Compute distance between texts
  11. 11
    The book most similar to "On the Origin of Species"
  12. 12
    Which books have similar content?


Python Python


Data ManipulationData VisualizationProbability & Statistics
Philippe Julien HeadshotPhilippe Julien

Senior Data Scientist at King

Philippe is a Senior Data Scientist at King, where he uses his analytical skills to improve games such as Candy Crush Saga. Before that, he worked for eight years as a researcher in computational biology studying how genomes evolve. In general, he is interested in the creative use of data in fields as diverse as science, gaming, sport, or tech in general.
See More


What do other learners have to say?