Skip to content
0

Youtube Data and Sentiment Analysis

photo credit: Bing Image Creator

๐Ÿ“‘ Executive Summary

Analysis and Methods

  • Data cleaning: We removed the duplicated rows, filled the missing values with appropriate values and changed the data type where necessary.
  • Exploratory data analysis: Conducted initial exploration to understand the distribution of key metrics such as like,comments and views. We visualized the percentage of the engagement metrics of top keywords using pie_charts. We also generated heatmap to visualize the engagement of each keywords with respect to their published years
  • Sentiment analysis: Firstly we analyzed the given sentiment values given in the comments data frame. Then we have also performed the sentiment analysis of the comment texts using TextBlob. We calculated the sentiment polarity and subjectivity of the comments and video titles in each video and visualized the sentiment distribution for every keywords.
  • Custom scoring model: We developed a custom scoring model to rank videos based on the normalized matrics. We assigned weights to each metric to calculate a composite score for each video. And finally we ranked videos both overall and within each keyword group.
  • Video ranking and recommendations: Ranked videos based on the custom score to identify the top-performing videos overall and for each keyword. We filtered the top-ranked videos in data science and machine learning catagory to find the best youtube videos that would be perticularly effective for an e-learning platform focused on data and AI skills. Finally we included recommendations for three specific videos and explained why each is ideal for promoting our e-learning platform.

๐Ÿ’พ The Data

The data for this competition is stored in two tables, videos_stats and comments.

videos_stats.csv

This table contains aggregated data for each YouTube video:

  • Video ID: A unique identifier for each video.
  • Title: The title of the video.
  • Published At: The publication date of the video.
  • Keyword: The main keyword or topic of the video.
  • Likes: The number of likes the video has received.
  • Comments: The number of comments on the video.
  • Views: The total number of times the video has been viewed.

comments.csv

This table captures details about comments made on YouTube videos:

  • Video ID: The identifier for the video the comment was made on (matches the Videos Stats table).
  • Comment: The text of the comment.
  • Likes: How many likes this comment has received.
  • Sentiment: The sentiment score ranges from 0 (negative) to 2 (positive), indicating the tone of a comment.
# import necessary libraaries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import LogNorm
import seaborn as sns
from textblob import TextBlob
from sklearn.preprocessing import MinMaxScaler
videos_stats = pd.read_csv('videos_stats.csv')
videos_stats
comments = pd.read_csv('comments.csv')
comments

๐Ÿ’ช Competition challenge

Create a report that covers the following:

  1. Exploratory Data Analysis of YouTube Trends:

    • Conduct an initial analysis of YouTube video trends across different industries. This analysis should explore basic engagement metrics such as views, likes, and comments and identify which types of content are most popular in each industry.
  2. Sentiment Analysis of Video Comments:

    • Perform a sentiment analysis on video comments to measure viewer perceptions. This task involves basic processing of text data and visualizing sentiment trends across various video categories.
  3. Development of a Video Ranking Model:

    • Create a simple model that uses sentiment analysis results and traditional engagement metrics to rank videos. This model should help identify potentially valuable videos for specific industry sectors.
  4. Strategic Recommendation for E-Learning Collaboration:

    • Use your modelโ€™s findings to identify YouTube videos that would be particularly effective for an E-Learning platform focused on Data and AI skills. Include recommendations for three specific videos, briefly explaining why each is ideal for promoting your E-Learning platform.

๐Ÿ”ŽDataset Overview

def info_df(df):
    data = []
    for column in df.columns:
        data.append({'Column_Name': column, 'Data_Type': df[column].dtype, 'Non-Null_Count': df[column].count(), 'Null_Count': df[column].isna().sum(), 'Percentage_NA': (df[column].isna().mean())*100, 'Unique_Values_Count': df[column].nunique()})
    result_df = pd.DataFrame(data)
    return result_df
display(info_df(videos_stats))
display(info_df(comments))

The datasets have a very few missing values. First we will drop the duplicated rows. Then we fill the missing values of Likes, Comments and Views columns of the videos_stats dataset by their corresponding mean. The null value in the Comment column of comments dataset is filled by an emptry string. We also converted the data type of Published At column into datetime

# Drop the duplicated rows
videos_stats_copy = videos_stats.drop_duplicates()
comments_copy = comments.drop_duplicates()
# Filling the missing values
videos_stats_copy[['Likes','Comments','Views']] = videos_stats_copy[['Likes','Comments','Views']].fillna(videos_stats_copy[['Likes','Comments','Views']].median())
comments_copy[['Comment']] = comments_copy[['Comment']].fillna(' ')
# Converting to datetime data type
videos_stats_copy['Published At'] = pd.to_datetime(videos_stats_copy['Published At'], format='%d/%m/%Y')
display(videos_stats_copy.info())
display(comments.info())

Cool. Now we don't have any missing values.

  1. ๐Ÿ“ˆ Exploratory Data Analysis of Youtube Trends

โ€Œ
โ€Œ
โ€Œ