Skip to content
0

Who's connected? Two decade story of our digital world

Streamlit App Repository

A two-decade journey through global internet adoption, revealing the stories of progress, barriers, and persistent challenges (2000-2023)

Executive Summary

  • Digital Transformation Progress

    • Global internet penetration increased from 6.8% (2000) to 67.5% (2023)
    • European regions lead with 92.4% average penetration
    • Sub-Saharan Africa shows lowest penetration at 36.2%
    • 89 countries achieved >80% internet adoption by 2023
  • Regional Digital Divide

    • High-Income Countries: 91.6% average penetration
    • Upper-Middle Income: 75.4%
    • Lower-Middle Income: 48.9%
    • Low Income: 27.8%
    • 47 countries remain below 50% internet penetration
  • Growth Patterns

    • Fastest growing region: South Asia (38.2% CAGR 2000-2023)
    • Most improved country: UAE (from 23.6% to 100%)
    • Highest consistent growth: East Asia & Pacific (31.4% CAGR)
    • COVID-19 Impact: 18.6% average growth in adoption (2019-2021)
  • Economic Correlations

    • Strong GDP-Internet correlation (r = 0.84)
    • Countries >$40,000 GDP per capita: 94.2% average penetration
    • Countries <$1,000 GDP per capita: 29.7% average penetration
    • Digital economy multiplier: 1.9x GDP growth in highly connected countries
  • Critical Areas

    • 28 countries in Sub-Saharan Africa below 30% penetration

    • 12 countries showing negative growth in 2022-2023

    • Infrastructure gap: 1.1 billion people without 4G coverage

    • Gender gap: 12.3% lower access for women globally

For comprehensive outlook, visit the project link: https://global-internet-patterns.streamlit.app/

I. Background

The internet has fundamentally transformed how we live, work, and connect with one another. Since its widespread adoption began in the early 2000s, it has evolved from a luxury to an essential utility, drastically reshaping global society. However, this digital transformation hasn't been uniform across the globe. While some nations have achieved near-universal internet access, others continue to grapple with limited connectivity, creating what we know as the "digital divide." This divide has profound implications for economic development, education, and social mobility. Our analysis spans from 2000 to 2023, a period that witnessed the rise of social media, the mobile internet revolution, and a global pandemic that highlighted the critical importance of digital connectivity. By examining internet usage patterns across different countries and regions, we uncover not just statistics, but stories of technological leaps, policy decisions, and societal changes that have shaped our connected world. Understanding these patterns isn't just about tracking progressβ€”it's about identifying opportunities to bridge remaining gaps and ensure that the benefits of digital connectivity reach every corner of the globe.

II. Challenge

In this competition, I aim to explore and visualize the evolution of global internet usage, uncovering meaningful insights about digital transformation across different nations. The challenge consists of several key components:

  1. Data Analysis and Preparation

    • Clean and process internet usage data spanning 2000-2023
    • Identify and handle missing values and anomalies
    • Integrate supplementary datasets for enriched analysis
    • Create derived metrics for deeper insights
  2. Visualization and Storytelling

    • Develop compelling visualizations showing internet usage trends by country
    • Identify emerging patterns and significant changes over time
    • Create interactive elements for exploration
    • Build narrative flow through data insights
  3. Supplementary Analysis

    • Incorporate GDP data to explore economic correlations
    • Analyze impact of major global events
    • Examine technological milestone influences
    • Study demographic factors affecting internet adoption
  4. Deliverables

    • Interactive dashboard showcasing key findings
    • Comprehensive visual story of internet adoption
    • Executive summary of insights and conclusions
    • Screenshots documenting the analysis journey

The goal is to create not just visualizations, but a compelling narrative that helps understand how internet access has evolved globally and what factors have influenced this evolution. This analysis should provide insights into both historical patterns and potential future trends in global internet adoption.

III. STREAMLIT APP: Who's Connected? A Two-Decade Story of Our Digital World

Streamlit App Repository

LinkedIn GitHub

About the Application

This interactive dashboard presents a data-driven exploration of global internet adoption from 2000 to 2023. Through comprehensive visualizations and analysis, it reveals the journey of digital progress, existing barriers, and ongoing challenges across various regions and economic groups. The dashboard provides deep insights into the evolution of global internet access, highlighting both significant achievements and areas requiring attention.

Key Features

  1. Interactive World Map

    • Dynamic choropleth visualization of internet penetration rates
    • Comprehensive year-by-year progression (2000-2023)
    • Color-coded visualization of the global digital divide
  2. Regional Analysis Dashboard

    • Comparative analysis of internet adoption across regions
    • Identification and tracking of low-connectivity areas
    • Temporal progress monitoring
  3. Economic Impact Analysis

    • Correlation between GDP and internet penetration
    • Comprehensive income group comparisons
    • Development trajectory visualization
  4. Timeline Visualization

    • Key technological milestones
    • Impact analysis of global events (e.g., COVID-19)
    • Detailed growth rate analysis

Interactive Data Filters

The dashboard features comprehensive filtering options for data exploration:

  1. Geographic Filters

    • Detailed country-specific analysis
    • Regional grouping options
    • Customizable area selection
  2. Economic Filters

    • World Bank income classification filters
    • GDP-based grouping options
    • Development status indicators
  3. Growth Pattern Filters

    • High-growth region identification
    • Stable market analysis
    • Developing area insights
  4. Time Period Selection

    • Granular year-by-year analysis
    • Customizable time range selection
    • Milestone-based period analysis

Installation Guide

# Clone the repository git clone https://github.com/jpcurada/global-internet-patterns.git # Navigate to the project directory cd global-internet-patterns # Install required packages pip install -r requirements.txt # Launch the Streamlit application streamlit run src/app.py

Project Structure

└── global-internet-patterns/ β”œβ”€β”€ README.md # Project documentation β”œβ”€β”€ LICENSE # MIT License β”œβ”€β”€ requirements.txt # Package dependencies └── src/ β”œβ”€β”€ app.py # Main Streamlit application β”œβ”€β”€ utils.py # Utility functions β”œβ”€β”€ visuals.py # Visualization functions └── data/ β”œβ”€β”€ internet_gdp_data.csv └── processed_data/

Application Preview

IV. Data Preparation

Data Sources

1. Internet Usage (internet_usage.csv)
Column nameDescription
Country NameName of the country
Country CodeCountries 3 character country code
2000Contains the % of population of individuals using the internet in 2000
2001Contains the % of population of individuals using the internet in 2001
2002Contains the % of population of individuals using the internet in 2002
2003Contains the % of population of individuals using the internet in 2003
.......
2023Contains the % of population of individuals using the internet in 2023

Source: DataCamp

2. GDP and Country Classification Data

The following data are downloaded from World Bank.

2.1. GDP Per Capita (gdp_per_capita.csv)

Column nameDescription
Country NameName of the country
Country CodeCountries 3 character country code
Indicator Name"GDP per capita (current US$)"
Indicator Code"NY.GDP.PCAP.CD"
1961GDP per capita in current US$ for 1961
1962GDP per capita in current US$ for 1962
......
2024GDP per capita in current US$ for 2024

2.2. Country Metadata (country_metadata.csv)

Column nameDescription
Country CodeCountries 3 character country code
RegionGeographical region classification
IncomeGroupEconomic classification by income level
SpecialNotesAdditional country-specific information

GDP per capita is gross domestic product divided by midyear population. GDP is the sum of gross value added by all resident producers in the economy plus any product taxes and minus any subsidies not included in the value of the products. It is calculated without making deductions for depreciation of fabricated assets or for depletion and degradation of natural resources. Data are in current U.S. dollars.

Last Updated: 1/28/2025

Source: World Bank national accounts data, and OECD National Accounts data files.

import pandas as pd
import numpy as np

import missingno as msno
import plotly.express as px
import plotly.graph_objects as go

from sklearn.impute import KNNImputer
from sklearn.linear_model import LinearRegression

Internet Usage Data

raw_data = pd.read_csv("data/internet_usage.csv") 
raw_data.head(10)
# We will do this to ensure that all non-numeric values will be null
columns = [str(year) for year in range(2000, 2024)]
for column in columns:
    raw_data[column] = pd.to_numeric(raw_data[column], errors='coerce')

Data Exploration

raw_data.info()

The dataset is currently in a wide format with 217 entries and 26 columns. Each row represents a country, with years (2000-2023) spread across columns.

  1. Column Naming Convention
  • Column names contain spaces ("Country Name")
  • Inconsistent formatting across columns
  • Need standardization to lowercase with underscores
  1. Data Types
  • All columns are of type 'object', including year columns
  • Year columns should be numeric for analysis
  • The presence of 'object' type suggests potential non-numeric values or special characters
  1. Missing Values
  • The 'Non-Null Count' shows all columns as complete (217 non-null)
  • However, this is misleading as special characters or placeholders might be masking true missing values
  • Requires conversion of non-numeric values to NULL for proper missing value detection
  1. Data Format
  • Currently in wide format, making time-series analysis difficult
  • Should be transformed to long format with variables:
    • country_name
    • country_code
    • year
    • internet_usage

The data requires reshaping from wide to long format using melt operation to:

  • Create a single 'year' column
  • Create a single 'internet_usage' value column
  • Maintain country identifiers as key columns
  • Convert data types appropriately after melting

These quality issues justify our subsequent data cleaning and transformation steps to prepare the data for analysis.

msno.matrix(raw_data)
β€Œ
β€Œ
β€Œ