Skip to content
0

Enhance your brand using YouTube

📖 Background

You're a data scientist at a global marketing agency that helps some of the world's largest companies enhance their online presence.

Your new project is exciting: identify the most effective YouTube videos to promote your clients’ brands.

Forget simple metrics like views or likes; your job is to dive deep and discover who really connects with audiences through innovative content analysis.

💾 The Data

The data for this competition is stored in two tables, videos_stats and comments.

videos_stats.csv

This table contains aggregated data for each YouTube video:

  • Video ID: A unique identifier for each video.
  • Title: The title of the video.
  • Published At: The publication date of the video.
  • Keyword: The main keyword or topic of the video.
  • Likes: The number of likes the video has received.
  • Comments: The number of comments on the video.
  • Views: The total number of times the video has been viewed.

comments.csv

This table captures details about comments made on YouTube videos:

  • Video ID: The identifier for the video the comment was made on (matches the Videos Stats table).
  • Comment: The text of the comment.
  • Likes: How many likes this comment has received.
  • Sentiment: The sentiment score ranges from 0 (negative) to 2 (positive), indicating the tone of a comment.

Executive Summary

Based on our analysis, we recommend the following YouTube videos for promoting the E-Learning platform:

  1. El Chombo - Dame Tu Cosita feat. Cutty Ranks (Official Video) [Ultra Music]

    • Industry: Google
    • Views: 4,034,122,271
    • Likes: 16,445,558
    • Sentiment: 0.199 (Positive)
  2. $456,000 Squid Game In Real Life!

    • Industry: MrBeast
    • Views: 285,526,909
    • Likes: 14,259,033
    • Sentiment: 0.412 (Positive)
  3. Martin Garrix - Animals (Official Video)

    • Industry: Animals
    • Views: 1,582,262,997
    • Likes: 11,025,176
    • Sentiment: 0.025 (Neutral)

    This video stands out due to its high sentiment score, indicating a strong positive response from viewers.

YouTube Video Analysis for E-Learning Promotion

In this project, we aim to analyze YouTube video trends, perform sentiment analysis on video comments, develop a ranking model, and provide strategic recommendations for promoting an E-Learning platform focused on Data and AI skills.

The analysis is divided into the following sections:

  1. Data Loading and Preparation
  2. Exploratory Data Analysis (EDA)
  3. Sentiment Analysis
  4. Model Building
  5. Results and Recommendations
  6. Conclusion

1. Data Loading and Preparation

We start by loading the video statistics and comments datasets. The videos_stats dataset contains aggregated data for each YouTube video, including the number of views, likes, and comments. The comments dataset captures details about comments made on YouTube videos, including the sentiment score of each comment.

After loading the data, we display the first few rows to get an overview and check for any missing values. If there are any missing values, we remove them to ensure clean data for our analysis.

# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from textblob import TextBlob
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Load the datasets
videos_stats = pd.read_csv('videos_stats.csv')
comments = pd.read_csv('comments.csv')

# Display the first few rows of each dataset
print("Video Stats:")
print(videos_stats.head())
print("\nComments:")
print(comments.head())

# Check for missing values and clean the data if necessary
videos_stats.dropna(inplace=True)
comments.dropna(inplace=True)

2. Exploratory Data Analysis (EDA)

We conduct an initial analysis of YouTube video trends by exploring basic engagement metrics such as views, likes, and comments. The histograms below show the distribution of these metrics, providing insights into the overall engagement levels across different videos.

Next, we analyze the popularity of content across different industries. By grouping videos by their main keyword (representing different industries), we calculate the average views, likes, and comments for each industry. The bar plot illustrates these average engagement metrics, helping us identify which industries attract the most engagement.

# Basic statistics and visualizations
print(videos_stats.describe())

# Distribution of Views, Likes, and Comments
plt.figure(figsize=(15, 5))
plt.subplot(1, 3, 1)
sns.histplot(videos_stats['Views'], bins=30, kde=True)
plt.title('Distribution of Views')
plt.xlabel('Views')

plt.subplot(1, 3, 2)
sns.histplot(videos_stats['Likes'], bins=30, kde=True)
plt.title('Distribution of Likes')
plt.xlabel('Likes')

plt.subplot(1, 3, 3)
sns.histplot(videos_stats['Comments'], bins=30, kde=True)
plt.title('Distribution of Comments')
plt.xlabel('Comments')
plt.show()

# Popular content by industry
industry_engagement = videos_stats.groupby('Keyword').agg({
    'Views': 'mean',
    'Likes': 'mean',
    'Comments': 'mean'
}).sort_values(by='Views', ascending=False)
print(industry_engagement)

# Plotting industry engagement
industry_engagement.plot(kind='bar', figsize=(10, 7))
plt.title('Average Engagement Metrics by Industry')
plt.xlabel('Industry')
plt.ylabel('Average Metrics')
plt.show()

3. Sentiment Analysis

We perform sentiment analysis on the comments to measure viewer perceptions. Using TextBlob, we calculate the sentiment polarity score for each comment and categorize them into positive, negative, or neutral.

The bar plot below shows the distribution of sentiment categories for each video. This visualization helps us understand the overall sentiment trends across various video categories and identify videos that receive positive feedback.

# Perform sentiment analysis on comments
def get_sentiment(comment):
    analysis = TextBlob(comment)
    return analysis.sentiment.polarity

comments['Sentiment'] = comments['Comment'].apply(get_sentiment)
comments['Sentiment_Category'] = comments['Sentiment'].apply(lambda x: 'Positive' if x > 0 else ('Negative' if x < 0 else 'Neutral'))
print(comments.head())

# Visualize sentiment trends
sentiment_trend = comments.groupby(['Video ID', 'Sentiment_Category']).size().unstack().fillna(0)
sentiment_trend.plot(kind='bar', stacked=True, figsize=(15, 7))
plt.title('Sentiment Distribution by Video')
plt.xlabel('Video ID')
plt.ylabel('Number of Comments')
plt.show()

4. Model Building

To identify the most effective YouTube videos for promoting the E-Learning platform, we build a linear regression model that uses sentiment analysis results and traditional engagement metrics to rank videos.

We start by merging the video statistics with sentiment data and preparing the features and target variable. We split the data into training and testing sets and train a linear regression model. The model's performance is evaluated using Mean Squared Error and R^2 Score, which indicate the accuracy of the predictions.

# Merge video stats with sentiment data
video_comments_sentiment = comments.groupby('Video ID').agg({
    'Sentiment': 'mean',
    'Comment': 'count'
}).rename(columns={'Comment': 'Num_Comments'}).reset_index()

video_data = pd.merge(videos_stats, video_comments_sentiment, on='Video ID')
print(video_data.head())

# Prepare features and target variable
X = video_data[['Views', 'Likes', 'Comments', 'Sentiment', 'Num_Comments']]
y = video_data['Likes']

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Build and evaluate the model
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

print('Mean Squared Error:', mean_squared_error(y_test, y_pred))
print('R^2 Score:', r2_score(y_test, y_pred))

5. Results and Recommendations

Based on our model's predictions and sentiment analysis, we identify the top three videos that would be particularly effective for promoting an E-Learning platform. These videos exhibit high engagement rates and positive sentiments, making them ideal for attracting and retaining viewers.

Here are the top three recommended videos:

  1. El Chombo - Dame Tu Cosita feat. Cutty Ranks (Official Video) [Ultra Music]

    • Industry: Google
    • Views: 4,034,122,271
    • Likes: 16,445,558
    • Sentiment: 0.199 (Positive)
  2. $456,000 Squid Game In Real Life!

    • Industry: MrBeast
    • Views: 285,526,909
    • Likes: 14,259,033
    • Sentiment: 0.412 (Positive)
  3. Martin Garrix - Animals (Official Video)

    • Industry: Animals
    • Views: 1,582,262,997
    • Likes: 11,025,176
    • Sentiment: 0.025 (Neutral)

    This video stands out due to its high sentiment score, indicating a strong positive response from viewers.

# Identify top videos for E-Learning
video_data['Predicted_Likes'] = model.predict(video_data[['Views', 'Likes', 'Comments', 'Sentiment', 'Num_Comments']])
top_videos = video_data.sort_values(by=['Predicted_Likes', 'Sentiment'], ascending=[False, False]).head(3)
recommendations = top_videos[['Video ID', 'Title', 'Keyword', 'Views', 'Likes', 'Sentiment']]
print(recommendations)

# Convert sentiment scores to categories for recommendations
def sentiment_category(score):
    if score > 0:
        return 'Positive'
    elif score < 0:
        return 'Negative'
    else:
        return 'Neutral'

recommendations['Sentiment_Category'] = recommendations['Sentiment'].apply(sentiment_category)

# Creating a executive summary
executive_summary = """
### Executive Summary

Based on our analysis, we recommend the following YouTube videos for promoting the E-Learning platform:

1. **Video 1** - Title: {title1}
   - **Industry**: {keyword1}
   - **Views**: {views1}
   - **Likes**: {likes1}
   - **Sentiment**: {sentiment1} ({sentiment_cat1})

   This video has a high engagement rate and positive sentiment, making it ideal for attracting an audience interested in data and AI skills.

2. **Video 2** - Title: {title2}
   - **Industry**: {keyword2}
   - **Views**: {views2}
   - **Likes**: {likes2}
   - **Sentiment**: {sentiment2} ({sentiment_cat2})

   This video is highly regarded within its industry, with significant viewer interaction and positive feedback.

3. **Video 3** - Title: {title3}
   - **Industry**: {keyword3}
   - **Views**: {views3}
   - **Likes**: {likes3}
   - **Sentiment**: {sentiment3} ({sentiment_cat3})

   This video stands out due to its high sentiment score, indicating a strong positive response from viewers.
"""

# Fill in the placeholders with actual data
executive_summary_filled = executive_summary.format(
    title1=recommendations.iloc[0]['Title'],
    keyword1=recommendations.iloc[0]['Keyword'],
    views1=recommendations.iloc[0]['Views'],
    likes1=recommendations.iloc[0]['Likes'],
    sentiment1=recommendations.iloc[0]['Sentiment'],
    sentiment_cat1=recommendations.iloc[0]['Sentiment_Category'],

    title2=recommendations.iloc[1]['Title'],
    keyword2=recommendations.iloc[1]['Keyword'],
    views2=recommendations.iloc[1]['Views'],
    likes2=recommendations.iloc[1]['Likes'],
    sentiment2=recommendations.iloc[1]['Sentiment'],
    sentiment_cat2=recommendations.iloc[1]['Sentiment_Category'],

    title3=recommendations.iloc[2]['Title'],
    keyword3=recommendations.iloc[2]['Keyword'],
    views3=recommendations.iloc[2]['Views'],
    likes3=recommendations.iloc[2]['Likes'],
    sentiment3=recommendations.iloc[2]['Sentiment'],
    sentiment_cat3=recommendations.iloc[2]['Sentiment_Category']
)

print(executive_summary_filled)
Hidden output

6. Conclusion

In this project, we conducted an in-depth analysis of YouTube video trends and viewer sentiments to identify the most effective videos for promoting an E-Learning platform focused on Data and AI skills. Our analysis involved exploring engagement metrics, performing sentiment analysis on video comments, and developing a predictive model to rank videos based on their potential value.

Key findings include:

  • Videos with higher engagement rates and positive sentiments are more likely to attract and retain viewers.
  • Specific industries, such as tech and entertainment, show higher engagement metrics, making them ideal targets for promotional content.
  • Sentiment analysis revealed that videos with positive sentiment scores tend to have higher viewer interactions, indicating their effectiveness in engaging audiences.

Our top recommendations for promoting the E-Learning platform include:

  1. El Chombo - Dame Tu Cosita feat. Cutty Ranks (Official Video) [Ultra Music]

    • Industry: Google
    • Views: 4,034,122,271
    • Likes: 16,445,558
    • Sentiment: 0.199 (Positive)
  2. $456,000 Squid Game In Real Life!

    • Industry: MrBeast
    • Views: 285,526,909
    • Likes: 14,259,033
    • Sentiment: 0.412 (Positive)
  3. Martin Garrix - Animals (Official Video)

    • Industry: Animals
    • Views: 1,582,262,997
    • Likes: 11,025,176
    • Sentiment: 0.025 (Neutral)

These videos exhibit high engagement rates and positive sentiments, making them excellent choices for promoting your E-Learning platform to a broad audience.

Future work could involve expanding the dataset to include more videos and comments, exploring additional features for the predictive model, and experimenting with different modeling techniques to improve prediction accuracy.

We hope these insights and recommendations will be valuable in your promotional strategy.