Skip to main content
Premium project

Find Movie Similarity from Plot Summaries

Use NLP and clustering on movie plot summaries from IMDb and Wikipedia to quantify movie similarity.

Start Project
12 Tasks1,500 XP

Loved by learners at thousands of companies

Project Description

Natural Language Processing (NLP) is an exciting field of study for data scientists where they develop algorithms that can make sense out of conversational language used by humans. In this Project, you will use NLP to find the degree of similarity between movies based on their plots available on IMDb and Wikipedia.

The dataset contains the titles of the top 100 movies on IMDb as well as each movie's plot summary from both IMDb and Wikipedia.

Project Tasks

  1. 1
    Import and observe dataset
  2. 2
    Combine Wikipedia and IMDb plot summaries
  3. 3
  4. 4
  5. 5
    Club together Tokenize & Stem
  6. 6
    Create TfidfVectorizer
  7. 7
    Fit transform TfidfVectorizer
  8. 8
    Import KMeans and create clusters
  9. 9
    Calculate similarity distance
  10. 10
    Import Matplotlib, Linkage, and Dendrograms
  11. 11
    Create merging and plot dendrogram
  12. 12
    Which movies are most similar?


Python Python


Data ManipulationData VisualizationMachine LearningProbability & Statistics
Anubhav Singh HeadshotAnubhav Singh

CTO & Co-founder, Dynopii

A developer since the pre-Bootstrap era, Anubhav has over a decade of experience dealing with large-scale software complexities as a freelancer before embarking on his own AI startup venture - Dynopii Inc. He has authored two books, "Hands-on Python Deep Learning for Web” and “Mobile Deep Learning with TensorFlow Lite, ML Kit and Flutter”. He's also a Google Venkat Panchapakesan Memorial Scholar. Anubhav has been a contributor to several open-source projects and was a Google Summer of Code participant in 2019. He also leads the team at GDG Cloud Kolkata. You will often find him talking about System architecture, Machine Learning and the web.
See More

What do other learners have to say?