Skip to content
0

How Much of the World Has Access to the Internet?

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import re
# Read the data
broadband = pd.read_csv('data/broadband.csv')
internet = pd.read_csv('data/internet.csv')
people = pd.read_csv('data/people.csv')

📖 Background

The following report aims to explore the state of internet access in the world, answering questions such as:

  • What are the top 5 countries with the highest internet use (by population share)?
  • How many people had internet access in those countries in 2019?
  • What are the top 5 countries with the highest internet use for each of the following regions: 'Africa Eastern and Southern', 'Africa Western and Central', 'Latin America & Caribbean', 'East Asia & Pacific', 'South Asia', 'North America', 'European Union'?
  • What are the 5 countries with the most internet users?
  • What is the correlation between internet usage (population share) and broadband subscriptions for 2019?

💾 The data

The research team compiled the following tables (source):
internet
  • "Entity" - The name of the country, region, or group.
  • "Code" - Unique id for the country (null for other entities).
  • "Year" - Year from 1990 to 2019.
  • "Internet_usage" - The share of the entity's population who have used the internet in the last three months.
people
  • "Entity" - The name of the country, region, or group.
  • "Code" - Unique id for the country (null for other entities).
  • "Year" - Year from 1990 to 2020.
  • "Users" - The number of people who have used the internet in the last three months for that country, region, or group.
broadband
  • "Entity" - The name of the country, region, or group.
  • "Code" - Unique id for the country (null for other entities).
  • "Year" - Year from 1998 to 2020.
  • "Broadband_Subscriptions" - The number of fixed subscriptions to high-speed internet at downstream speeds >= 256 kbit/s for that country, region, or group.

Acknowledgments: Max Roser, Hannah Ritchie, and Esteban Ortiz-Ospina (2015) - "Internet." OurWorldInData.org.

💻Internet Usage ranking in 2019

In 2019, Arab states of the Persian Gulf are predominant among the countries with the highest internet use (per 100 inhabitants). The one with the smallest user population, Bahrain(1.4M) appears at the top followed closely by Qatar(2.8M) and Kuwait(4.4M). The much more populated UAE (9.1M) appears at 4th and finally Denmark(5.6M) at 5th:

#Merging internet and people tables, keeping only country rows
countries=internet[internet['Code'].notnull()].merge(people,on=['Entity','Code','Year'],how='left').reset_index(drop=True)
countries=countries.rename(columns={'Entity':'Country'})

#Sorting values by Internet Usage
topcountries_2019usage=countries.query('Year==2019').sort_values('Internet_Usage',ascending=False)
topcountries_2019usage.iloc[0:5].reset_index(drop=True)

Hidden code

🖥Internet Usage ranking in previous years

In past years, however, Nordic and small countries in Europe and its territories were at the top of usage ranking, with the first Persian Gulf country (Bahrein) appearing at 5th just in 2016, so the current trend of Arab Persian Gulf countries at the top of internet usage is relativevly recent:

#Indexes of top 5 countries highest Internet Usage by Year
top5year=[i[1] for i in countries.groupby('Year')['Internet_Usage'].nlargest(5).keys()]

#Using previous indexes to subset countries and adding ranks
countries_top5usage=countries.iloc[top5year].reset_index(drop=True)
countries_top5usage['rank']=[1,2,3,4,5]*(int(len(countries_top5usage)/5))

#Parameters for point plot
start=2015
end=2020
countries_top5usage_sub=countries_top5usage.query(f'Year>={start} and Year<{end}').reset_index(drop=True)
countries_top5usage_sub['Country(Code)']=countries_top5usage_sub['Country']+' ('+countries_top5usage_sub['Code']+')'

#Point plot inverting ylabels and adding data labels
g3=sns.catplot(x='Year',y='rank',hue='Country(Code)',data=countries_top5usage_sub,kind='point',legend=True)
for ax in g3.axes.ravel():
    ax.invert_yaxis()
for idx,row in countries_top5usage_sub.iterrows():
    x = row[2]%start
    y = row[5]
    text = row[1]
    plt.text(x+.05,y-.05,text)
    
plt.yticks([1,2,3,4,5])
plt.show()
            
#Parameters for point plot
start=2010
end=2015
countries_top5usage_sub=countries_top5usage.query(f'Year>={start} and Year<{end}').reset_index(drop=True)
countries_top5usage_sub['Country(Code)']=countries_top5usage_sub['Country']+' ('+countries_top5usage_sub['Code']+')'

#Point plot inverting ylabels and adding data labels
g2=sns.catplot(x='Year',
               y='rank',
               hue='Country(Code)', 
               data=countries_top5usage_sub,
               kind='point',
               legend=True
              )
for ax in g2.axes.ravel():
    ax.invert_yaxis()
for idx,row in countries_top5usage_sub.iterrows():
    x = row[2]%start
    y = row[5]
    text = row[1]
    plt.text(x+.05,y-.05,text)
    
plt.yticks([1,2,3,4,5])
plt.show()
#Reading scraped data
CountryRegion=pd.read_csv('data/CountryRegion.csv')
EUcountries=['Austria', 'Belgium', 'Bulgaria', 'Croatia', 'Cyprus', 'Czech Republic', 'Denmark', 'Estonia', 'Finland', 'France', 'Germany', 'Greece', 'Hungary', 'Ireland', 'Italy', 'Latvia', 'Lithuania', 'Luxembourg', 'Malta', 'Netherlands', 'Poland', 'Portugal', 'Romania', 'Slovak Republic', 'Slovenia', 'Spain','Sweden','United Kingdom']

AEScountries=[]
with open('data/AfricaEasternSouthern.txt') as f:
    for line in f:
        country=re.findall('<a.*?>(.+?)</a>',line)
        if len(country)>0:
            AEScountries.append(country[0])
AEScountries=[s.replace('Democratic Republic of Congo','Congo, Dem. Rep.') for s in AEScountries]
AEScountries=[s.replace('Sao Tome &amp; Principe','São Tomé and Principe') for s in AEScountries]

AWCcountries=[]
with open('data/AfricaWesternCentral.txt') as f:
    for line in f:
        country=re.findall('<a.*?>(.+?)</a>',line)
        if len(country)>0:
            AWCcountries.append(country[0])
AWCcountries=[s.replace('Republic of Congo','Congo, Rep.') for s in AWCcountries]
AWCcountries=[s.replace("Cote d'Ivoire","Côte d'Ivoire") for s in AWCcountries]
AWCcountries=[s.replace("Gambia","Gambia, The") for s in AWCcountries]

APGcountries=['Bahrain', 'Kuwait', 'Iraq', 'Oman', 'Qatar', 'Saudi Arabia','United Arab Emirates']

#Assigning Region to countries in CountryRegion
CountryRegion=CountryRegion[['Country Code','Table Name','Region']]
CountryRegion=CountryRegion.rename(columns={'Country Code': "Code"})
CountryRegion.loc[CountryRegion['Table Name'].isin(EUcountries),'Region']='European Union'
CountryRegion.loc[CountryRegion['Table Name'].isin(AWCcountries),'Region']='Africa Western and Central'
CountryRegion.loc[CountryRegion['Table Name'].isin(AEScountries),'Region']='Africa Eastern and Southern'
CountryRegion.loc[CountryRegion['Table Name'].isin(APGcountries),'Region']='Arab Persian Gulf'

selectedRegions=['Africa Eastern and Southern', 'Africa Western and Central', 'Latin America & Caribbean', 'East Asia & Pacific', 'South Asia', 'North America', 'European Union']

#Merging countries and CountryRegion
countries=countries.merge(CountryRegion[['Code','Region']],on='Code',how='left')[['Country','Code','Year','Internet_Usage','Users','Region']]



🪐Internet usage overtime compared in different regions:

As seen below, recent rapid growth is noteworthy among countries in the Arab Persian Gulf comprising: Bahrain, Kuwait, Iraq, Oman, Qatar, Saudi Arabia and United Arab Emirates. Less than a decade ago (c. 2015) we could see these countries using the internet as much as East Asia and Latin American countries, and now showing usage stats at par of EU and North American countries.

Another important insight from this analysis is the huge difference between Sub-saharian countries(Africa Eastern and Southern, and Africa Western and Central regions) and the rest of the world, with just around 20% of its population recently reporting using the internet.


#Calculating Internet Usage for created Region: Arab Persian Gulf
countriesAPG=countries[countries['Region']=='Arab Persian Gulf']
countriesAPG_grouped=pd.DataFrame(countriesAPG.groupby(['Year','Region'])['Users'].sum()*100/countriesAPG.groupby(['Year','Region']).apply(lambda x: (x['Users']*100/x['Internet_Usage']).sum())).reset_index()

#Joining countries and created region
countriesAPG_grouped.columns=['Year','Entity','Internet_Usage']
selectedRegions_pd=pd.concat([internet.loc[internet['Entity'].isin(selectedRegions)],countriesAPG_grouped]).reset_index(drop=True)

#Plotting Internet Usage by Region by Year
sns.color_palette("tab10")
sns.relplot(x='Year',
            y='Internet_Usage',
            data=selectedRegions_pd,
            kind='line',
            style='Entity',
            hue='Entity',
            markers=True,
            dashes=False
            )
plt.show()
✍Notes about methodology:
  • Since Brexit didn't took effect until Jan 31st, 2020, United Kingdom is considered in the European Union
  • Region assignments to each country were made through scraping sites and datasets in worldbank.org (txt and csv files uploaded to 'data/')
  • 'Arab Persian Gulf' region was added for contrast, figures for internet usage were agregated from Internet_Usage and Users