Skip to content
Competition - Internet News and Consumer Engagement
(Invalid URL)
Internet News and Consumer Engagement
Ready to put your coding skills to the test? Join us for our Workspace Competition.
For more information, visit datacamp.com/workspacecompetition
Context
This dataset (source) consists of data about news articles collected from Sept. 3, 2019 until Nov. 4, 2019. Afterwards, it is enriched by Facebook engagement data, such as number of shares, comments and reactions. It was first created to predict the popularity of an article before it was published. However, there is a lot more you can analyze; take a look at some suggestions at the end of this template.
Load packages
library(skimr)
library(tidyverse)
Load your Data
articles <- readr::read_csv('data/news_articles.csv.gz')
articles$source_id <- as.factor(articles$source_id)
articles$source_name <- as.factor(articles$source_name)
skim(articles) %>%
select(-(numeric.p0:numeric.p100)) %>%
select(-(complete_rate))
Understand your data
Variable. | Description |
---|---|
source_id | publisher unique identifier |
source_name | human-readable publisher name |
author | article author |
title | article headline |
description | article short description |
url | article URL from publisher website |
url_to_image | URL to main image associated with the article |
published_at | exact time and date of publishing the article |
content | unformatted content of the article truncated to 260 characters |
top_article | value indicating if article was listed as a top article on publisher website |
engagement_reaction_count | users reactions count for posts on Facebook involving article URL |
engagement_comment_count | users comments count for posts on Facebook involving article URL |
engagement_share_count | users shares count for posts on Facebook involving article URL |
engagement_comment_plugin_count | Users comments count for Facebook comment plugin on article website |
Now you can start to explore this dataset with the chance to win incredible prices! Can't think of where to start? Try your hand at these suggestions:
- Extract useful insights and visualize them in the most interesting way possible.
- Categorize the articles into different categories based on, for example, sentiment.
- Cluster the news articles, authors or publishers based on, for example, topic.
- Make a title generator based on data such as content, description, etc.
Judging Criteria
CATEGORY | WEIGHTAGE | DETAILS |
---|---|---|
Analysis | 30% |
|
Results | 30% |
|
Creativity | 40% |
|