Skip to content
Competition - Analyzing global internet patterns
Analyzing global internet patterns
๐ Background
In this competition, you'll be exploring a dataset that highlights internet usage for different countries from 2000 to 2023. Your goal is import, clean, analyze and visualize the data in your preferred tool.
The end goal will be a clean, self explanatory, and interactive visualization. By conducting a thorough analysis, you'll dive deeper into how internet usage has changed over time and the countries still widely impacted by lack of internet availability.
๐พ Data
You have access to the following file, but you can supplement your data with other sources to enrich your analysis.
Interet Usage (internet_usage.csv)
internet_usage.csv)| Column name | Description |
|---|---|
| Country Name | Name of the country |
| Country Code | Countries 3 character country code |
| 2000 | Contains the % of population of individuals using the internet in 2000 |
| 2001 | Contains the % of population of individuals using the internet in 2001 |
| 2002 | Contains the % of population of individuals using the internet in 2002 |
| 2003 | Contains the % of population of individuals using the internet in 2003 |
| .... | ... |
| 2023 | Contains the % of population of individuals using the internet in 2023 |
The data can be downloaded from the Files section (File > Show workbook files).
๐ช Challenge
Use a tool of your choice to create an interesting visual or dashboard that summarizes your analysis!
Things to consider:
- Use this Workspace to prepare your data (optional).
- Stuck on where to start, here's some ideas to get you started:
- Visualize interner usage over time, by country
- How has internet usage changed over time, are there any patterns emerging?
- Consider bringing in other data to supplement your analysis
- Create a screenshot of your main dashboard / visuals, and paste in the designated field.
- Summarize your findings in an executive summary.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
data = pd.read_csv("internet_usage.csv")
data.head()# replace .. values by null
data_filled = data.applymap(lambda x: None if x == '..' else x)
data_fill = data_filled.drop(columns='Country Code')
data_fill.head()data_filled = data_fill.set_index('Country Name')
data_fill_pivoted = data_filled.T
data_fill_pivoted.index.name = 'Year'
data_fill_pivoteddf = data_fill_pivoted.to_csv('df2.csv')# which countries do not have record of internet usage
df3_nulls = data_fill[data_fill.iloc[:, 2:].isna().all(axis=1)]
df3_nullsSeven countries have no record of intenate usage in the years between 2000 and 2023.
# Slice countries with recored internate usage
df3 = data_fill.copy()
df3_internet = df3[~df3['Country Name'].isin(df3_nulls['Country Name'])]
df3_internet.head()# fill the null values with backward fill
df4 = df3_internet.fillna(method='bfill', axis=1).fillna(method='ffill', axis=1)
df4.head()# convert the object datatype to float
clean_data = df4.drop(columns=['Country Name']).astype(float)
df_clean = pd.concat([df4[['Country Name']],clean_data], axis=1)
df_clean.info()Hidden output
# generate a clean data that excludes the country code
df5 = df_clean.set_index('Country Name')
df5.columns = 'y' + '_' + df5.columns
df5.head()df6 = df5.T
df6 = df6.to_csv('df6.csv')# Calculate the average internet usage over the years
internet_year = df5.mean(axis=0).round(2).reset_index()
internet_year.columns = ['year', 'mean_usage']
internet_year.head()# Show with line plot how the internet usage changed over the years
plt.figure(figsize=(6,4))
sns.lineplot(data=internet_year, x='year', y='mean_usage', color='blue')
plt.title('The average internet usage of the world over the years')
plt.xticks(rotation=90)
plt.show()โ
โ
โ
โ
โ