Skip to content
Internet: A Global Phenomenon
  • AI Chat
  • Code
  • Report
  • Internet: A Global Phenomenon

    This dataset contains information on internet access around the world.

    The workspace is set up with two CSV files containing information on global internet access for years ranging from 1990 to 2020.

    • internet_users.csv
      • users - The number of people who have used the internet in the last three months
      • share - The share of the entity's population who have used the internet in the last three months
    • adoption.csv
      • fixed_telephone_subs - The number of people who have a telephone landline connection
      • fixed_telephone_subs_share - The share of the entity's population who have a telephone landline connection
      • fixed_broadband_subs - The number of people who have a broadband internet landline connection
      • fixed_broadband_subs_share - The share of the entity's population who have a broadband internet landline connection
      • mobile_cell_subs - The number of people who have a mobile subscription
      • mobile_cell_subs_share - The share of the entity's population who have a mobile subscription

    Both data files are indexed on the following 3 attributes:

    • entity - The name of the country, region, or group.
    • code - Unique id for the country (null for other entities).
    • year - Year from 1990 to 2020.

    Check out the guiding questions or the scenario described below to get started with this dataset! Feel free to make this workspace yours by adding and removing cells, or editing any of the existing cells.

    Source: Our World In Data

    🌎 Some guiding questions to help you explore this data:

    1. What are the top 5 countries with the highest internet use (by population share)?
    2. What are the top 5 countries with the highest internet use for some large regions?
    3. What is the correlation between internet usage (population share) and broadband subscriptions for 2020?

    Note: This is how the World Bank defines the different regions.

    📊 Visualization ideas

    • Line chart: Display internet usage over time of the top 5 countries.
    • Map: Vividly illustrate the internet usage around the world in a certain year on a map. Leveraging, for example, GeoPandas or Folium.

    🔍 Scenario: Identify emerging markets for a global internet provider

    This scenario helps you develop an end-to-end project for your portfolio.

    Background: You work for a global internet provider on a mission to provide affordable Internet access to everybody around the world using satellites. You are tasked with identifying which markets (regions or countries) are most worthwhile to focus efforts on.

    Objective: Construct a top 5 list of countries where there is a big opportunity to roll out our services. Try to consider the amount of people not having access to (good) wired or mobile internet and their spending power.

    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sns
    import geopandas as gpd
    import folium
    import branca
    
    internet_users = pd.read_csv('internet_users.csv')
    internet_users.head()
    adoption = pd.read_csv('adoption.csv')
    adoption.head()
    
    top5_share = internet_users[internet_users['year'] == 2020].nlargest(5, 'share')
    top5_share[['entity','users', 'share']]
    top5_regions = internet_users[(internet_users['year'] == 2020) & (internet_users['share'].isna())].nlargest(15, 'users')
    top5_regions[['entity', 'users']]
    internet_2020 = internet_users[(internet_users['year'] == 2020) & (internet_users['code'].notnull())]
    world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
    merged = world.merge(internet_2020, left_on='iso_a3', right_on='code', how='inner')
    
    fig, ax = plt.subplots(figsize=(15, 10))
    merged.boundary.plot(ax=ax)
    merged.plot(column='share', ax=ax, legend=True,
                legend_kwds={
                    'label': "Internet Usage Share by Country",
                    'shrink': 0.5 
                },
                cmap= 'YlOrRd', linewidth=0.8, edgecolor='0.8')
    plt.show()
    
    # Filter out rows from internet_2020 where code is null
    internet_2020 = internet_users[(internet_users['year'] == 2020) & (internet_users['code'].notnull())]
    
    # Load the world dataset from geopandas
    world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
    
    # Merge on the country code
    merged = world.merge(internet_2020, left_on='iso_a3', right_on='code', how='inner')
    
    # Create a color scale. You can adjust these values as needed.
    color_scale = branca.colormap.LinearColormap(
        colors=['yellow', 'green', 'blue'],
        index=merged['share'].quantile([0, 0.5, 1]),
        vmin=merged['share'].min(),
        vmax=merged['share'].max(),
        caption='Internet Usage Share'
    )
    
    def get_color(feature):
        value = feature['properties']['share']
        if value is None:
            return "#8c8c8c"  # grey color for missing values
        return color_scale(value)
    
    # Plotting using folium
    m = folium.Map([20,0], zoom_start=2)
    
    folium.GeoJson(
        merged.to_json(),
        name='Internet Usage Share',
        style_function=lambda feature: {
            'fillColor': get_color(feature),
            'color': 'black',
            'weight': 1,
            'dashArray': '5, 5'
        },
        highlight_function=lambda x: {'weight': 3},
        tooltip=folium.features.GeoJsonTooltip(
            fields=['entity', 'share'],
            aliases=['Country:', 'Internet Usage Share:'],
            localize=True
        )
    ).add_to(m)
    
    color_scale.add_to(m)  # add the color scale to the map
    
    m
    
    country_share= internet_users[internet_users.entity == "World"].dropna(subset="code")
    country_share
    world_share = country_share[['year', 'share']].reset_index(drop=True).set_index('year')
    world_share
    broadband_last= adoption[adoption.year >=1990].dropna(subset="code")
    
    broadband = broadband_last.groupby('year')['fixed_telephone_subs_share', 'fixed_broadband_subs_share', 'mobile_cell_subs_share'].mean()
    broadband.head()
    plt.style.use("default")
    # Set up a plot
    plt.figure(figsize=(15, 8))
    sns.set_style("darkgrid")
    
    # Plot Global share
    sns.lineplot(x=world_share.index, y=world_share.share, color='blue', label="World's Internet Share")
    
    # Plot the broadband share
    sns.lineplot(x=broadband.index, y=broadband.fixed_broadband_subs_share, color='red', label='Broadband Share')
    
    # Plot the broadband share
    sns.lineplot(x=broadband.index, y=broadband.fixed_telephone_subs_share, color='green', label='Telephone Share')
    
    # Plot the broadband share
    sns.lineplot(x=broadband.index, y=broadband.mobile_cell_subs_share, color='orange', label='Mobile Share')
    
    plt.xlabel("Years")
    plt.ylabel("Shares")
    
    plt.title('Internet and Broadband Shares')
    plt.legend()
    plt.tight_layout()
    plt.show()
    
    import requests
    
    # Fetch countries classified as lower-middle income from the World Bank API
    url = 'http://api.worldbank.org/v2/country?incomeLevel=LMC&format=json&per_page=300'
    response = requests.get(url)
    data = response.json()
    
    # Extract country names from the response
    lower_middle_income_countries = [country['name'] for country in data[1]]
    
    print(lower_middle_income_countries)
    
    lower_middle_share = internet_users[(internet_users['entity'].isin(lower_middle_income_countries)) & (internet_users['year'] == 2020)]
    lower_5 = lower_middle_share.nsmallest(5, 'share')
    lower_5