Skip to content
New Workbook
Sign up
Competition - YouTube
0

Enhance your brand using YouTube

📖 Background

You're a data scientist at a global marketing agency that helps some of the world's largest companies enhance their online presence.

Your new project is exciting: identify the most effective YouTube videos to promote your clients’ brands.

Forget simple metrics like views or likes; your job is to dive deep and discover who really connects with audiences through innovative content analysis.

💾 The Data

The data for this competition is stored in two tables, videos_stats and comments.

videos_stats.csv

This table contains aggregated data for each YouTube video:

  • Video ID: A unique identifier for each video.
  • Title: The title of the video.
  • Published At: The publication date of the video.
  • Keyword: The main keyword or topic of the video.
  • Likes: The number of likes the video has received.
  • Comments: The number of comments on the video.
  • Views: The total number of times the video has been viewed.

comments.csv

This table captures details about comments made on YouTube videos:

  • Video ID: The identifier for the video the comment was made on (matches the Videos Stats table).
  • Comment: The text of the comment.
  • Likes: How many likes this comment has received.
  • Sentiment: The sentiment score ranges from 0 (negative) to 2 (positive), indicating the tone of a comment.
import pandas as pd
videos_stats = pd.read_csv('videos_stats.csv')

# Quick overview of data
print(videos_stats.head())
print(videos_stats.info())
print(videos_stats.isnull().sum())
videos_stats
comments = pd.read_csv('comments.csv')

# Quick Overview of data
print(comments.head())
print(comments.info())
print(comments.isnull().sum())
comments

💪 Competition challenge

Create a report that covers the following:

  1. Exploratory Data Analysis of YouTube Trends:

    • Conduct an initial analysis of YouTube video trends across different industries. This analysis should explore basic engagement metrics such as views, likes, and comments and identify which types of content are most popular in each industry.
  2. Sentiment Analysis of Video Comments:

    • Perform a sentiment analysis on video comments to measure viewer perceptions. This task involves basic processing of text data and visualizing sentiment trends across various video categories.
  3. Development of a Video Ranking Model:

    • Create a simple model that uses sentiment analysis results and traditional engagement metrics to rank videos. This model should help identify potentially valuable videos for specific industry sectors.
  4. Strategic Recommendation for E-Learning Collaboration:

    • Use your model’s findings to identify YouTube videos that would be particularly effective for an E-Learning platform focused on Data and AI skills. Include recommendations for three specific videos, briefly explaining why each is ideal for promoting your E-Learning platform.

🧑‍⚖️ Judging criteria

CATEGORYWEIGHTINGDETAILS
Recommendations35%
  • Clarity of recommendations - how clear and well presented the recommendation is.
  • Quality of recommendations - are appropriate analytical techniques used & are the conclusions valid?
  • Number of relevant insights found for the target audience.
Storytelling35%
  • How well the data and insights are connected to the recommendation.
  • How the narrative and whole report connects together.
  • Balancing making the report in-depth enough but also concise.
Visualizations20%
  • Appropriateness of visualization used.
  • Clarity of insight from visualization.
Votes10%
  • Up voting - most upvoted entries get the most points.

✅ Checklist before publishing into the competition

  • Rename your workspace to make it descriptive of your work. N.B. you should leave the notebook name as notebook.ipynb.
  • Remove redundant cells like the judging criteria, so the workbook is focused on your story.
  • Make sure the workbook reads well and explains how you found your insights.
  • Try to include an executive summary of your recommendations at the beginning.
  • Check that all the cells run without error

⌛️ Time is ticking. Good luck!

# Merge both dataframes as the data contained in either dataframe are compatible

merged = pd.merge(videos_stats, comments, on='Video ID')
merged_cleaned = merged.drop_duplicates(keep=False)

print(merged_cleaned.duplicated().sum())
merged_cleaned.head(20)
# Aggregrate views and likes by keyword
industry_metrices = merged_cleaned.groupby('Keyword').agg({
    'Views': 'sum',
    'Likes_x': 'sum', # Likes_x is from video_stats df
    'Comments': 'sum' # Comments is from video_stats df
})

# Sort by metrices
industry_metrices_sort = industry_metrices.sort_values(by=['Views', 'Likes_x', 'Comments'], ascending=[False, False, False])
industry_metrices_sort
import matplotlib.pyplot as plt
import seaborn as sns

# Visualisation for views
sns.barplot(x=industry_metrices_sort.index, y=industry_metrices_sort['Views'], palette='viridis')

plt.xlabel('Keyword', fontsize=12)
plt.ylabel('Total Views', fontsize=12)
plt.xticks(rotation=90)
plt.title('Total views by Industry', fontsize=16)
plt.show()

# Visualisation for likes
sns.barplot(x=industry_metrices_sort.index, y=industry_metrices_sort['Likes_x'], palette='coolwarm')

plt.xlabel('Keyword', fontsize=12)
plt.ylabel('Total Likes', fontsize=12)
plt.xticks(rotation=90)
plt.title('Total likes by Industry', fontsize=16)
plt.show()

# Visualisation for comments 
sns.barplot(x=industry_metrices_sort.index, y=industry_metrices_sort['Comments'], palette='coolwarm')

plt.xlabel('Keyword', fontsize=12)
plt.ylabel('Total Comments', fontsize=12)
plt.xticks(rotation=90)
plt.title('Total comments by Industry', fontsize=16)
plt.show()