Skip to content
0

How Much of the World Has Access to the Internet?

๐Ÿ“– Background

You work for a policy consulting firm. One of the firm's principals is preparing to give a presentation on the state of internet access in the world. She needs your help answering some questions about internet accessibility across the world.

๐Ÿ’พ The data

The research team compiled the following tables (source):
internet
  • "Entity" - The name of the country, region, or group.
  • "Code" - Unique id for the country (null for other entities).
  • "Year" - Year from 1990 to 2019.
  • "Internet_usage" - The share of the entity's population who have used the internet in the last three months.
people
  • "Entity" - The name of the country, region, or group.
  • "Code" - Unique id for the country (null for other entities).
  • "Year" - Year from 1990 to 2020.
  • "Users" - The number of people who have used the internet in the last three months for that country, region, or group.
broadband
  • "Entity" - The name of the country, region, or group.
  • "Code" - Unique id for the country (null for other entities).
  • "Year" - Year from 1998 to 2020.
  • "Broadband_Subscriptions" - The number of fixed subscriptions to high-speed internet at downstream speeds >= 256 kbit/s for that country, region, or group.

Acknowledgments: Max Roser, Hannah Ritchie, and Esteban Ortiz-Ospina (2015) - "Internet." OurWorldInData.org.

๐Ÿ’ช Challenge

Create a report to answer the principal's questions. Include:

  1. What are the top 5 countries with the highest internet use (by population share)?
  2. How many people had internet access in those countries in 2019?
  3. What are the top 5 countries with the highest internet use for each of the following regions: 'Middle East & North Africa', 'Latin America & Caribbean', 'East Asia & Pacific', 'South Asia', 'North America', 'Europe & Central Asia'?
  4. Create a visualization for those five regions' internet usage over time.
  5. What are the 5 countries with the most internet users?
  6. What is the correlation between internet usage (population share) and broadband subscriptions for 2019?
  7. Summarize your findings.

Note: This is how the World Bank defines the different regions.

Imports

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

sns.set_style("whitegrid")

pd.set_option('display.float_format', '{:,.2f}'.format)
# Read the broadband table
broadband = pd.read_csv('data/broadband.csv')

# Read the internet table
internet = pd.read_csv('data/internet.csv')

# Read the people table
people = pd.read_csv('data/people.csv')

# regions
region_df = pd.read_excel('data/CLASS.xlsx')

Q1. What are the top 5 countries with the highest internet use (by population share)?

year = 2019

merge = internet.groupby('Entity')['Year'].max().reset_index()
merge = merge[merge['Year'] == year]
df = pd.merge(internet, merge, on = ['Entity', 'Year'], how = 'inner').sort_values(by = "Internet_Usage", ascending = False).reset_index(drop = True).head()
df
plt.figure(figsize = (16, 4))
sns.barplot(data = df, y = 'Entity', x = 'Internet_Usage')
plt.tight_layout()

Q2. How many people had internet access in those countries in 2019?

countries = df['Entity'].values
tmp = people[(people['Entity'].isin(countries)) & (people['Year'] == 2019)]
tmp = pd.merge(df, tmp, on = 'Code', how = 'inner').drop(['Entity_y','Year_y'], axis = 1)
tmp.columns = ['Entity','Code','Year','Internet_Usage','Users']
tmp = tmp.head(5)
tmp
plt.figure(figsize = (10, 8))
ax = sns.scatterplot(data = tmp, x = 'Users', y = 'Internet_Usage')

# Annotate each point in the scatter plot
for i in range(tmp.shape[0]):
    ax.text(tmp['Users'][i] + 0.1,  # use proper indexing for DataFrame columns
            tmp['Internet_Usage'][i] + 0.1,  # use proper indexing for DataFrame columns
            tmp['Entity'][i],  # use proper indexing for DataFrame columns
            horizontalalignment='left',
            size='small', color='black', weight='semibold')

plt.ylim(0, 110)
plt.title("Internet users vs. Internet Usages as a percent of the total population")
plt.tight_layout()

Q3. What are the top 5 countries with the highest internet use for each of the following regions: 'Middle East & North Africa', 'Latin America & Caribbean', 'East Asia & Pacific', 'South Asia', 'North America', 'Europe & Central Asia'?


regions = sorted([x for x in region_df['Region'].dropna().unique()])

fig, axs = plt.subplots(2, 4, figsize = (16, 8), sharex = True)
axs = axs.flatten()

max_years = internet.groupby('Entity')['Year'].max().reset_index()
tmp = pd.merge(internet, region_df, on = 'Code', how = 'left')
tmp = pd.merge(tmp, max_years, on = ['Entity', 'Year'], how = 'inner')
tmp = tmp[['Region','Entity','Code','Year','Internet_Usage']].sort_values(by = "Internet_Usage", ascending = False)

i = 0
for region in regions:
    tmp2 = tmp[tmp['Region'] == region].head(5)
    sns.barplot(data = tmp2, x = 'Internet_Usage', y = 'Entity', ax = axs[i])
    axs[i].set_title(f"{region}")
    axs[i].set_xlabel("% Internet Usage")
    i += 1

axs[i].remove()

plt.tight_layout()
โ€Œ
โ€Œ
โ€Œ