Book Recommendations from Charles Darwin

Build a book recommendation system using NLP and the text of books like "On the Origin of Species."

Start Project

12 Tasks1,500 XP

Loved by learners at thousands of companies

Project Description

Recommendation systems are at the heart of many products such as Netflix or Amazon. They generally rely on metadata (e.g., the actors or director of a movie) or on user tastes (e.g., the movies you liked before) to determine which you are most likely to enjoy. But when you are working with text-heavy datasets, you have access to a much richer resource—the whole text! In this project, you will learn how to build the basis of a book recommendation system based on their content. You will use Charles Darwin's bibliography to find out which books might interest you.

The dataset was manually collected from Project Gutenberg.

Project Tasks

1
Darwin's bibliography

2
Load the contents of each book into Python
3
Find "On the Origin of Species"
4
Tokenize the corpus
5
Stemming of the tokenized corpus
6
Building a bag-of-words model
7
The most common words of a given book
8
Build a tf-idf model
9
The results of the tf-idf model
10
Compute distance between texts
11
The book most similar to "On the Origin of Species"
12
Which books have similar content?

Technologies

Python

Topics

Data Manipulation Data Visualization Probability & Statistics

Philippe Julien

Senior Data Scientist at King

Philippe is a Senior Data Scientist at King, where he uses his analytical skills to improve games such as Candy Crush Saga. Before that, he worked for eight years as a researcher in computational biology studying how genomes evolve. In general, he is interested in the creative use of data in fields as diverse as science, gaming, sport, or tech in general.

FAQs

What do other learners have to say?

Book Recommendations from Charles Darwin

Loved by learners at thousands of companies

Project Description

Project Tasks

FAQs

Is this project suitable for beginners?

What is the programming language of this project?

Can I add this project to my Data Portfolio?

Do I need to download any software to complete this project?

What do other learners have to say?