Skip to content
0

How Much of the World Has Access to the Internet?

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
[17]
# Read the data
broadband = pd.read_csv('data/broadband.csv')
# Read the internet table
internet = pd.read_csv('data/internet.csv')
# Read the people table
people = pd.read_csv('data/people.csv')

# Take a look at the first rows
broadband

💾 The data

The research team compiled the following tables (source):
internet
  • "Entity" - The name of the country, region, or group.
  • "Code" - Unique id for the country (null for other entities).
  • "Year" - Year from 1990 to 2019.
  • "Internet_usage" - The share of the entity's population who have used the internet in the last three months.
people
  • "Entity" - The name of the country, region, or group.
  • "Code" - Unique id for the country (null for other entities).
  • "Year" - Year from 1990 to 2020.
  • "Users" - The number of people who have used the internet in the last three months for that country, region, or group.
broadband
  • "Entity" - The name of the country, region, or group.
  • "Code" - Unique id for the country (null for other entities).
  • "Year" - Year from 1998 to 2020.
  • "Broadband_Subscriptions" - The number of fixed subscriptions to high-speed internet at downstream speeds >= 256 kbit/s for that country, region, or group.

Acknowledgments: Max Roser, Hannah Ritchie, and Esteban Ortiz-Ospina (2015) - "Internet." OurWorldInData.org.

💪 Challenge

Create a report to answer the principal's questions. Include:

  1. What are the top 5 countries with the highest internet use (by population share)?
  2. How many people had internet access in those countries in 2019?
  3. What are the top 5 countries with the highest internet use for each of the following regions: 'Middle East & North Africa', 'Latin America & Caribbean', 'East Asia & Pacific', 'South Asia', 'North America', 'Europe & Central Asia'?
  4. Create a visualization for those five regions' internet usage over time.
  5. What are the 5 countries with the most internet users?
  6. What is the correlation between internet usage (population share) and broadband subscriptions for 2019?
  7. Summarize your findings.

Note: This is how the World Bank defines the different regions.

Question 1: What are the top 5 countries with the highest internet use (by population share)?

internet_sorted = internet.sort_values(by=['Year','Internet_Usage'], ascending=False)
internet_top_5 = internet_sorted.iloc[:5,:]
top_5_list = list(internet_top_5['Entity'])
print(internet_top_5)
print(top_5_list)

The top 5 countries with the highest internet share were Bahrain, Qatar, Kuwait, United Arab Emerites, and Denmark. This only represents the most recent year, 2019.

Question 2: How many people had internet access in those countries in 2019?

people_w_access_2019 = people[people['Year']==2019]
people_w_access_2019_top_5 = people_w_access_2019[people_w_access_2019['Entity'].isin(top_5_list)].sort_values(by=['Users'], ascending=False)
print(people_w_access_2019_top_5)

🧑‍⚖️ Judging criteria

CATEGORYWEIGHTINGDETAILS
Response quality85%
  • Accuracy (30%) - The response must be representative of the original data and free from errors.
  • Clarity (25%) - The response must be easy to understand and clearly expressed.
  • Completeness (30%) - The response must be a full report that responds to the question posed.
Presentation15%
  • How legible/understandable the response is.
  • How well-formatted the response is.
  • Spelling and grammar.

In the event of a tie, earlier submission time will be used as a tie-breaker.

✅ Checklist before submitting your workspace

  • Rename your workspace to make it descriptive of your work. N.B., you should leave the notebook name as notebook.ipynb.
  • Remove redundant cells like the introduction to data science notebooks, so the workbook is focused on your story.
  • Check that all the cells run without error.