Internet: A Global Phenomenon
This dataset contains information on internet access around the world.
The workspace is set up with two CSV files containing information on global internet access for years ranging from 1990 to 2020.
internet_users.csv
users
- The number of people who have used the internet in the last three monthsshare
- The share of the entity's population who have used the internet in the last three months
adoption.csv
fixed_telephone_subs
- The number of people who have a telephone landline connectionfixed_telephone_subs_share
- The share of the entity's population who have a telephone landline connectionfixed_broadband_subs
- The number of people who have a broadband internet landline connectionfixed_broadband_subs_share
- The share of the entity's population who have a broadband internet landline connectionmobile_cell_subs
- The number of people who have a mobile subscriptionmobile_cell_subs_share
- The share of the entity's population who have a mobile subscription
Both data files are indexed on the following 3 attributes:
entity
- The name of the country, region, or group.code
- Unique id for the country (null for other entities).year
- Year from 1990 to 2020.
Check out the guiding questions or the scenario described below to get started with this dataset! Feel free to make this workspace yours by adding and removing cells, or editing any of the existing cells.
Source: Our World In Data
🌎 Some guiding questions to help you explore this data:
- What are the top 5 countries with the highest internet use (by population share)?
- What are the top 5 countries with the highest internet use for some large regions?
- What is the correlation between internet usage (population share) and broadband subscriptions for 2020?
Note: This is how the World Bank defines the different regions.
🔍 Scenario: Identify emerging markets for a global internet provider
This scenario helps you develop an end-to-end project for your portfolio.
Background: You work for a global internet provider on a mission to provide affordable Internet access to everybody around the world using satellites. You are tasked with identifying which markets (regions or countries) are most worthwhile to focus efforts on.
Objective: Construct a top 5 list of countries where there is a big opportunity to roll out our services. Try to consider the amount of people not having access to (good) wired or mobile internet and their spending power.
You can query the pre-loaded CSV files using SQL directly. Here’s a sample query:
SELECT *
FROM 'internet_users.csv'
LIMIT 10
import pandas as pd
internet_users = pd.read_csv('internet_users.csv')
internet_users.head()
adoption = pd.read_csv('adoption.csv')
adoption.head()
Ready to share your work?
Click "Share" in the upper right corner, copy the link, and share it! You can also add this workspace to your DataCamp Portfolio
Top 5 countries with the highest internet use by population
# Filter the data for the year 2020
internet_users_2020 = internet_users[internet_users['year'] == 2020]
# Sort the data based on the share column in descending order and get the top 5 countries
top5_countries_2020 = internet_users_2020.sort_values(by='share', ascending=False).head(5)
top5_countries_2020[['entity', 'share']]
import matplotlib.pyplot as plt
# List of top 5 countries for 2020
top5_countries_list = top5_countries_2020['entity'].tolist()
# Filter the internet_users data for these countries
top5_countries_data = internet_users[internet_users['entity'].isin(top5_countries_list)]
# Plotting the line chart
plt.figure(figsize=(14, 8))
for country in top5_countries_list:
country_data = top5_countries_data[top5_countries_data['entity'] == country]
plt.plot(country_data['year'], country_data['share'], label=country, marker='o')
plt.title('Internet Usage Over Time of the Top 5 Countries (1990-2020)')
plt.xlabel('Year')
plt.ylabel('Internet Usage Share (%)')
plt.legend()
plt.grid(True, which='both', linestyle='--', linewidth=0.5)
plt.tight_layout()
plt.show()
import geopandas as gpd
# Load the world map data from GeoPandas
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
# Merge the world map data with the internet usage data for 2020
world_internet_2020 = world.merge(internet_users_2020, left_on='iso_a3', right_on='code', how='left')
# Plotting the choropleth map
fig, ax = plt.subplots(1, 1, figsize=(15, 10))
world.boundary.plot(ax=ax, linewidth=1)
world_internet_2020.plot(column='share', ax=ax, legend=True,
legend_kwds={'label': "Internet Usage Share (%)"},
cmap='OrRd', missing_kwds={'color': 'lightgrey'})
ax.set_title('Internet Usage Around the World (2020)')
plt.show()
Top 5 countries with the highest internet use for some large regions
We can focus on specific large regions such as Asia, Africa, Europe, North America, and South America. We will extract the top country with the highest internet use from each of these regions for the year 2020. If regional information isn't explicitly available in the dataset, we can use an external source to identify countries belonging to these regions.