Who are these 3,384 in Iceland? // XP Competition 2022
  • AI Chat
  • Code
  • Report
  • Spinner

    How Much of the World Has Access to the Internet?

    Author

    Samvel Kocharyan, [email protected]
    https://www.linkedin.com/in/samvelkoch/
    2022

    πŸ’Ύ The data

    The research team compiled the following tables (source):
    internet
    • "Entity" - The name of the country, region, or group.
    • "Code" - Unique id for the country (null for other entities).
    • "Year" - Year from 1990 to 2019.
    • "Internet_usage" - The share of the entity's population who have used the internet in the last three months.
    people
    • "Entity" - The name of the country, region, or group.
    • "Code" - Unique id for the country (null for other entities).
    • "Year" - Year from 1990 to 2020.
    • "Users" - The number of people who have used the internet in the last three months for that country, region, or group.
    broadband
    • "Entity" - The name of the country, region, or group.
    • "Code" - Unique id for the country (null for other entities).
    • "Year" - Year from 1998 to 2020.
    • "Broadband_Subscriptions" - The number of fixed subscriptions to high-speed internet at downstream speeds >= 256 kbit/s for that country, region, or group.

    Acknowledgments: Max Roser, Hannah Ritchie, and Esteban Ortiz-Ospina (2015) - "Internet." OurWorldInData.org.

    # Import smomething useful
    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sns
    import numpy as np
    Hidden output
    # Read the internet table
    internet = pd.read_csv('data/internet.csv')

    1. What are the top 5 countries with the highest internet use (by population share)?

    # Let's explore internet table
    internet.describe()
    
    # Have a look at last entries by year 2019
    
    top5_internet_2019 = internet[internet['Year'] == 2019]
    top5_internet_2019_sorted = top5_internet_2019.sort_values(by=['Internet_Usage'], ascending=False)
    top5_internet_2019_sorted.head(5)
    

    TOP 5 countries with the highest internet use (by population share in 2019)

      1. πŸ‡§πŸ‡­ Bahrain
      1. πŸ‡ΆπŸ‡¦ Qatar
      1. πŸ‡°πŸ‡Ό Kuwait
      1. πŸ‡¦πŸ‡ͺ UAE
      1. πŸ‡©πŸ‡° Denmark
    # But for whole dataset years coverage (1990-2019) 
    # "TOP5 Internet Usage" should be absolutely different
    
    # Will use Internet_Usage Median as indicator instead of 
    # Mean and Sum. Outliers will not smash our rating. 
    
    top5_internet = internet.groupby('Entity')['Internet_Usage'].agg([np.sum, np.mean, np.median]).sort_values(by="median", ascending=False).head(6)
    top5_internet
    # Hmm... Kosovo on the TOP. But this country became indepent only in 2008. 
    # Not fair enough to be in this top. 
    # Let's check the year when Kosovo got it's first data 
    # in the 'internet' table? 
    
    kosovo = internet[internet['Entity'] == 'Kosovo']
    kosovo
    
    ### OK. Kosovo 2017. Doesn't work for TOP5 rating which accumulates 
    # Internet Usage stat from 1990. 
    # Let's forget about Kosovo for a while and explore our next leader countries. 
    
    leaders = ['Iceland', 'Sweden', 'Denmark', 'Norway', 'Netherlands']
    internet[(internet['Entity'].isin(leaders)) & (internet['Year'] == 1990)].head(6)
    # Iceland - leader of the rating has no data for 1990. 
    # But in 1991 Iternet Usage was 0.5%. 
    # And for next 28 years Iceland was in the top. Fair enough for leader. 
    
    internet[internet['Code'] == 'ISL']['Year'].agg('min')
    top5_internet_leaders = internet[internet['Entity'] != 'Kosovo'].groupby('Entity')[['Entity','Internet_Usage']].agg(np.median).sort_values(by='Internet_Usage',ascending=False).head(5)
    top5_internet_leaders
    β€Œ
    β€Œ
    β€Œ