You’re part of a group

Switch to your group space and start collaborating with your teammates.

You’re part of a group

Switch to your group space and start collaborating with your teammates.














Sign up
Workspace
Andy Ly/

Internet: A Global Phenomenon

0
Beta
Spinner

Internet: A Global Phenomenon

This dataset contains information on internet access around the world.

The workspace is set up with two CSV files containing information on global internet access for years ranging from 1990 to 2020.

  • internet_users.csv
    • users - The number of people who have used the internet in the last three months
    • share - The share of the entity's population who have used the internet in the last three months
  • adoption.csv
    • fixed_telephone_subs - The number of people who have a telephone landline connection
    • fixed_telephone_subs_share - The share of the entity's population who have a telephone landline connection
    • fixed_broadband_subs - The number of people who have a broadband internet landline connection
    • fixed_broadband_subs_share - The share of the entity's population who have a broadband internet landline connection
    • mobile_cell_subs - The number of people who have a mobile subscription
    • mobile_cell_subs_share - The share of the entity's population who have a mobile subscription

Both data files are indexed on the following 3 attributes:

  • entity - The name of the country, region, or group.
  • code - Unique id for the country (null for other entities).
  • year - Year from 1990 to 2020.

Check out the guiding questions or the scenario described below to get started with this dataset! Feel free to make this workspace yours by adding and removing cells, or editing any of the existing cells.

Source: Our World In Data

🌎 Some guiding questions to help you explore this data:

  1. What are the top 5 countries with the highest internet use (by population share)?
  2. What are the top 5 countries with the highest internet use for some large regions?
  3. What is the correlation between internet usage (population share) and broadband subscriptions for 2020?

Note: This is how the World Bank defines the different regions.

📊 Visualization ideas

  • Line chart: Display internet usage over time of the top 5 countries.
  • Map: Vividly illustrate the internet usage around the world in a certain year on a map. Leveraging, for example, GeoPandas or Folium.

🔍 Scenario: Identify emerging markets for a global internet provider

This scenario helps you develop an end-to-end project for your portfolio.

Background: You work for a global internet provider on a mission to provide affordable Internet access to everybody around the world using satellites. You are tasked with identifying which markets (regions or countries) are most worthwhile to focus efforts on.

Objective: Construct a top 5 list of countries where there is a big opportunity to roll out our services. Try to consider the amount of people not having access to (good) wired or mobile internet and their spending power.

You can query the pre-loaded CSV files using SQL directly. Here’s a sample query:

Unknown integration
DataFrameavailable as
df
variable
SELECT *
FROM 'internet_users.csv'
LIMIT 1000
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
Unknown integration
DataFrameavailable as
df2
variable
SELECT DISTINCT entity
FROM internet_users.csv
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
Unknown integration
DataFrameavailable as
df1
variable
SELECT entity, SUM(users)/ 1000000 AS total_users_millions
FROM internet_users.csv
WHERE entity NOT IN ('World', 'Asia', 'Upper-middle-income countries', 'Lower-middle-income countries', 'High-income countries', 'Europe', 'North America', 'South America', 'Africa') AND year ='2000'
GROUP BY entity
ORDER BY total_users_millions DESC
LIMIT 5;
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
Unknown integration
DataFrameavailable as
df4
variable
SELECT *
FROM 'adoption.csv'
LIMIT 1000
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
Unknown integration
DataFrameavailable as
df5
variable
SELECT entity, fixed_telephone_subs
FROM adoption.csv
WHERE entity NOT IN ('World', 'Asia', 'Middle-income countries', 'Upper-middle-income countries', 'High-income countries', 'Europe', 'North America', 'South America', 'Africa') AND year ='2000'
GROUP BY entity, fixed_telephone_subs
ORDER BY fixed_telephone_subs DESC
LIMIT 20;
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
Unknown integration
DataFrameavailable as
df6
variable
SELECT entity, fixed_broadband_subs
FROM adoption.csv
WHERE entity NOT IN ('World', 'Asia', 'Middle-income countries', 'Upper-middle-income countries', 'High-income countries', 'Europe', 'North America', 'South America', 'Africa') AND year ='2000'
GROUP BY entity, fixed_broadband_subs
ORDER BY fixed_broadband_subs DESC
LIMIT 20;
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
Unknown integration
DataFrameavailable as
df7
variable
SELECT entity, mobile_cell_subs
FROM adoption.csv
WHERE entity NOT IN ('World', 'Asia', 'Middle-income countries', 'Upper-middle-income countries', 'High-income countries', 'Europe', 'North America', 'South America', 'Africa') AND year ='2000'
GROUP BY entity, mobile_cell_subs
ORDER BY mobile_cell_subs DESC
LIMIT 20;
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
import pandas as pd
internet_users = pd.read_csv('internet_users.csv')
filtered_countries = ['World', 'Asia', 'Upper-middle-income countries', 'High-income countries', 'Upper-middle-income countries', 'Europe', 'North America', 'South America', 'Africa', 'Lower-middle-income countries']

# Filter the internet_users dataframe
internet_users = internet_users[~internet_users['entity'].isin(filtered_countries)]
# Filter the dataframe to include only the top 5 users by entity
internet_users_top5 = internet_users.groupby('entity').sum().nlargest(5, 'users')

internet_users_top5
filtered_users = internet_users[internet_users['entity'].isin(['China', 'United States', 'India', 'Japan', 'Brazil'])]
filtered_users
filtered_users.sort_values(by=['entity', 'year'], inplace=True)
filtered_users



  • AI Chat
  • Code