Internet News and Consumer Engagement

This dataset contains data on news articles published between early September to early November 2019. It's enriched by Facebook engagement data, such as the number of shares, comments, and reactions. The dataset was first created to predict the popularity of an article before it was published; however, there is a lot more you can analyze!

Not sure where to begin? Scroll to the bottom to find challenges!

import pandas as pd
news = pd.read_csv("news_articles.csv", index_col=0)
print(news.shape)
news.head(100)

Data dictionary

	Variable	Description
0	source_id	publisher unique identifier
1	source_name	human-readable publisher name
2	author	article author
3	title	article headline
4	description	article short description
5	url	article URL from publisher website
6	url_to_image	url to main image associated with the article
7	published_at	exact time and date of publishing the article
8	content	unformatted content of the article truncated to 260 characters
9	top_article	value indicating if article was listed as a top article on publisher website
10	engagement_reaction_count	users reactions count for posts on Facebook involving article URL
11	engagement_comment_count	users comments count for posts on Facebook involving article URL
12	engagement_share_count	users shares count for posts on Facebook involving article URL
13	engagement_comment_plugin_count	Users comments count for Facebook comment plugin on article website

Source of dataset.

Don't know where to start?

Challenges are brief tasks designed to help you practice specific skills:

🗺️ Explore: What publishers and authors publish the most content based on this dataset? How about most engaging content?
📊 Visualize: Create two words clouds for the title and description of the articles to find the most popular words. Make sure to remove stop words!
🔎 Analyze: On days where total engagement was higher than usual, can you identify a common event or theme based on text?

Scenarios are broader questions to help you develop an end-to-end project for your portfolio:

You have a friend who works as a reporter for BBC news. He's been disappointed in his articles' low Facebook engagement and that his articles have never been listed as top articles on the BBC. You've offered your help by finding data-driven recommendations on how he should position his articles (such as guidelines on title and description) and when in the day he should publish articles. He's interested in what makes a top article at BBC and what gets the most Facebook engagement.

You will need to prepare a report that is accessible to a broad audience. It will need to outline your motivation, analysis steps, findings, and conclusions.

news.columns

news['source_name'].value_counts()

news.groupby(['source_name']).agg(
    top_article_sum = ('top_article', sum),
    engagement_reaction = ('engagement_reaction_count', sum),
    engagement_comment = ('engagement_comment_count', sum),
    egagement_share = ('engagement_share_count', sum),
    engagement_comment_plugin = ('engagement_comment_plugin_count', sum))

source_grp_mean_df = news.groupby(['source_name']).agg(
    top_article_mean = ('top_article', 'mean'),
    engagement_reaction = ('engagement_reaction_count', 'mean'),
    engagement_comment = ('engagement_comment_count', 'mean'),
    engagement_share = ('engagement_share_count', 'mean'),
    engagement_comment_plugin = ('engagement_comment_plugin_count', 'mean'))

source_grp_mean_df

import matplotlib.pyplot as plt

source_grp_mean_df.sort_values(by = 'top_article_mean', ascending = False)

plt.figure(figsize=(10, 5))
plt.gca().set_facecolor("lavender")

df = source_grp_mean_df.sort_values(by = 'top_article_mean', ascending = False)

# plot the bar
plt.bar(df.index, df.top_article_mean, color = "green")

# giving title to the plot
plt.title("Average top article by Source", weight="bold", color="blue")

# giving X and Y labels
plt.xlabel("Source", weight="bold", color="brown")
plt.ylabel("Average Top Article", weight="bold", color="brown")

# Modifying the ticks
plt.xticks(rotation=90, color="blue", weight="bold")
plt.yticks(color="blue", weight="bold")

plt.figure(figsize=(10, 5))
plt.gca().set_facecolor("lavender")

df = source_grp_mean_df.sort_values(by = 'engagement_reaction', ascending = False)

# plot the bar
plt.bar(df.index, df.engagement_reaction, color = "green")

# giving title to the plot
plt.title("Engagement Reaction by Source", weight="bold", color="blue")

# giving X and Y labels
plt.xlabel("Source", weight="bold", color="brown")
plt.ylabel("Engagement Reaction", weight="bold", color="brown")

# Modifying the ticks
plt.xticks(rotation=90, color="blue", weight="bold")
plt.yticks(color="blue", weight="bold")

source_grp_mean_df.columns

plt.figure(figsize=(10, 5))
plt.gca().set_facecolor("lavender")

df = source_grp_mean_df.sort_values(by = 'engagement_comment', ascending = False)

# plot the bar
plt.bar(df.index, df.engagement_comment, color = "green")

# giving title to the plot
plt.title("Engagement Comment by Source", weight="bold", color="blue")

# giving X and Y labels
plt.xlabel("Source", weight="bold", color="brown")
plt.ylabel("Engagement Comment", weight="bold", color="brown")

# Modifying the ticks
plt.xticks(rotation=90, color="blue", weight="bold")
plt.yticks(color="blue", weight="bold")

plt.figure(figsize=(10, 5))
plt.gca().set_facecolor("lavender")

df = source_grp_mean_df.sort_values(by = 'engagement_share', ascending = False)

# plot the bar
plt.bar(df.index, df.engagement_share, color = "green")

# giving title to the plot
plt.title("Engagement Share by Source", weight="bold", color="blue")

# giving X and Y labels
plt.xlabel("Source", weight="bold", color="brown")
plt.ylabel("Engagement Share", weight="bold", color="brown")

# Modifying the ticks
plt.xticks(rotation=90, color="blue", weight="bold")
plt.yticks(color="blue", weight="bold")

‌
‌
‌

Internet News and Consumer Engagement

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Internet News and Consumer Engagement

Data dictionary

Don't know where to start?

Internet News and Consumer Engagement