Find Movie Similarity from Plot Summaries
Find Movie Similarity from Plot Summaries

Use NLP and clustering on movie plot summaries from IMDb and Wikipedia to quantify movie similarity.

Project Description

Natural Language Processing (NLP) is an exciting field of study for data scientists where they develop algorithms that can make sense out of conversational language used by humans. In this Project, you will use NLP to find the degree of similarity between movies based on their plots available on IMDb and Wikipedia.

The dataset contains the titles of the top 100 movies on IMDb as well as each movie's plot summary from both IMDb and Wikipedia.

Project Tasks

    Import and observe dataset
    Combine Wikipedia and IMDb plot summaries
    Club together Tokenize & Stem
    Create TfidfVectorizer
    Fit transform TfidfVectorizer
    Import KMeans and create clusters
    Calculate similarity distance
    Import Matplotlib, Linkage, and Dendrograms
    Create merging and plot dendrogram
    Which movies are most similar?


Python Python


Data ManipulationData VisualizationMachine LearningProbability & Statistics
Anubhav Singh HeadshotAnubhav Singh

CTO & Co-founder, Dynopii

