"Hack The Feed" — DataLab

The purpose of this project is to understand the interactions of our business with customers online: how we have engaged with customers through different social media platforms namely Facebook, LInkedIn, Instagram and Twitter. The insights we derive from this will inform a better approach to reaching customers in the future with the type of content they like. If customers see the type of content they like, they remain happy and engaged with us. Happy and engaged customers equate to more profit, so we want to understand the best way to keep our customers engaged with us through their favorite contents.

Prerequisite: Data Preparation

The dataframe for the "Twitter" dataset contains 147 columns and 8529 rows before cleaning and validation. I checked each column one by one for missing values, their data types, duplicates and validity. I discovered that 110 of these columns have null values only. Since these are empty columns, I removed them from the dataset. I also removed possible rows that might be duplicated. I converted the "Date" column to a DateTime type, and I added the columns "Day" and "Month" to denote the day of the week and the corresponding month of each date in the "Date" column. I did not convert the "Post ID" column to floats, even though it's a numerical type: we will not be doing float-type computation on it. For columns that carry numerical types of values, I converted the numerical values to floats - for better computation; and finally, I replaced the remaining missing numerical values with 0.

The dataset contains 39 columns and 8529 rows after cleaning and validation.

These data cleaning and validation steps are also applied to the Facebook, Instagram and LinkedIn datasets respectively.

Step 1: Data familiarization. Here we see what the dataframe looks like, which columns have null values, which data types each column contains, and how many rows there are in the dataframe. Hidden cells in this workspace provide additional information.

# Importing relevant libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

twitter = pd.read_csv('Post Performance (Stanbic IBTC) January 1, 2013 - July 13, 2023_Twitter.csv')

# What does our dataframe look like? How many rows and columns do we have?

twitter.shape

# Let's check for missing values. Quite a lot of the columns have missing values as shown below

twitter.isna().sum()

# Cleaning the dataframe by removing columns where ALL the values are empty and by removing duplicated rows.

twitter = twitter.dropna(axis = 'columns', how = 'all').drop_duplicates()

# Let's see what the dataframe looks like now

twitter.info()
twitter

# Next, we want to make sure the "Date" column is converted to the DateTime type and that the figures in other columns are converted to float types for easier computing.



twitter['Date'] = pd.to_datetime(twitter['Date'])
twitter = twitter.replace(['%', ','], ['', ''], regex = True)
twitter.info()
twitter

# Converting needed columns to float types. In case there are empty values in any of these columns, we fill them with 0. It's a longer route but easier to understand.

twitter = twitter.astype({'Impressions' : 'float', 'Organic Impressions' : 'float', 'Potential Reach' : 'float','Engagement Rate (per Impression)' : 'float','Engagements' : 'float','Reactions' : 'float','Likes' : 'float','Comments' : 'float','Shares' : 'float','Click-Through Rate' : 'float','Post Link Clicks' : 'float','Other Post Clicks' : 'float','Post Clicks (All)' : 'float','Post Media Clicks' : 'float','Post Hashtag Clicks' : 'float','Post Detail Expand Clicks' : 'float','Profile Clicks' : 'float','Other Engagements' : 'float','App Engagements' : 'float','App Install Attempts' : 'float','App Opens' : 'float','Follows from Post' : 'float','Unfollows from Post' : 'float','bit.ly Link Clicks' : 'float','Video Views' : 'float', 'Media Views' : 'float' }).fillna(0)

# Also, let's add a "Day" and a "Month" column to denote the days and months corresponding to each date in the "Date" column.

twitter['Day'] = twitter['Date'].dt.day_name()
twitter['Month'] = twitter['Date'].dt.month_name()
twitter.info()
twitter

The dataframe has now successfully been cleaned.

Step 2: Next, we look at insights. We will try to keep our visualizations simple.

On our Twitter outlet, photos are the types of posts that have the highest engagements on average, followed by videos, texts and links.

# On Twitter, photos are the types of posts that have the highest engagements on average, followed by videos, texts and links.
order = ['Photo','Video','Text', 'Link']
sns.barplot(x = 'Content Type', y = 'Engagements', data = twitter, ci = None, order = order)
order = ['Photo','Video','Text', 'Link']
plt.ylabel ('Average Engagements', fontsize = 14)
plt.xlabel('Type Of Content', fontsize = 14)
plt.title('Average Engagement By Content Type - Twitter', fontsize = 16,  pad = 40)
plt.show()

print(twitter['Sent by'].value_counts())

Of the named individuals among the content creators, Sophia made the highest amount of posts

‌
‌
‌