Executive Summary
-
Digital Transformation Progress
- Global internet penetration increased from 6.8% (2000) to 67.5% (2023)
- European regions lead with 92.4% average penetration
- Sub-Saharan Africa shows lowest penetration at 36.2%
- 89 countries achieved >80% internet adoption by 2023
-
Regional Digital Divide
- High-Income Countries: 91.6% average penetration
- Upper-Middle Income: 75.4%
- Lower-Middle Income: 48.9%
- Low Income: 27.8%
- 47 countries remain below 50% internet penetration
-
Growth Patterns
- Fastest growing region: South Asia (38.2% CAGR 2000-2023)
- Most improved country: UAE (from 23.6% to 100%)
- Highest consistent growth: East Asia & Pacific (31.4% CAGR)
- COVID-19 Impact: 18.6% average growth in adoption (2019-2021)
-
Economic Correlations
- Strong GDP-Internet correlation (r = 0.84)
- Countries >$40,000 GDP per capita: 94.2% average penetration
- Countries <$1,000 GDP per capita: 29.7% average penetration
- Digital economy multiplier: 1.9x GDP growth in highly connected countries
-
Critical Areas
-
28 countries in Sub-Saharan Africa below 30% penetration
-
12 countries showing negative growth in 2022-2023
-
Infrastructure gap: 1.1 billion people without 4G coverage
-
Gender gap: 12.3% lower access for women globally
-
For comprehensive outlook, visit the project link: https://global-internet-patterns.streamlit.app/
I. Background
The internet has fundamentally transformed how we live, work, and connect with one another. Since its widespread adoption began in the early 2000s, it has evolved from a luxury to an essential utility, drastically reshaping global society. However, this digital transformation hasn't been uniform across the globe. While some nations have achieved near-universal internet access, others continue to grapple with limited connectivity, creating what we know as the "digital divide." This divide has profound implications for economic development, education, and social mobility. Our analysis spans from 2000 to 2023, a period that witnessed the rise of social media, the mobile internet revolution, and a global pandemic that highlighted the critical importance of digital connectivity. By examining internet usage patterns across different countries and regions, we uncover not just statistics, but stories of technological leaps, policy decisions, and societal changes that have shaped our connected world. Understanding these patterns isn't just about tracking progressβit's about identifying opportunities to bridge remaining gaps and ensure that the benefits of digital connectivity reach every corner of the globe.
II. Challenge
In this competition, I aim to explore and visualize the evolution of global internet usage, uncovering meaningful insights about digital transformation across different nations. The challenge consists of several key components:
-
Data Analysis and Preparation
- Clean and process internet usage data spanning 2000-2023
- Identify and handle missing values and anomalies
- Integrate supplementary datasets for enriched analysis
- Create derived metrics for deeper insights
-
Visualization and Storytelling
- Develop compelling visualizations showing internet usage trends by country
- Identify emerging patterns and significant changes over time
- Create interactive elements for exploration
- Build narrative flow through data insights
-
Supplementary Analysis
- Incorporate GDP data to explore economic correlations
- Analyze impact of major global events
- Examine technological milestone influences
- Study demographic factors affecting internet adoption
-
Deliverables
- Interactive dashboard showcasing key findings
- Comprehensive visual story of internet adoption
- Executive summary of insights and conclusions
- Screenshots documenting the analysis journey
The goal is to create not just visualizations, but a compelling narrative that helps understand how internet access has evolved globally and what factors have influenced this evolution. This analysis should provide insights into both historical patterns and potential future trends in global internet adoption.
III. STREAMLIT APP: Who's Connected? A Two-Decade Story of Our Digital World
About the Application
This interactive dashboard presents a data-driven exploration of global internet adoption from 2000 to 2023. Through comprehensive visualizations and analysis, it reveals the journey of digital progress, existing barriers, and ongoing challenges across various regions and economic groups. The dashboard provides deep insights into the evolution of global internet access, highlighting both significant achievements and areas requiring attention.
Key Features
-
Interactive World Map
- Dynamic choropleth visualization of internet penetration rates
- Comprehensive year-by-year progression (2000-2023)
- Color-coded visualization of the global digital divide
-
Regional Analysis Dashboard
- Comparative analysis of internet adoption across regions
- Identification and tracking of low-connectivity areas
- Temporal progress monitoring
-
Economic Impact Analysis
- Correlation between GDP and internet penetration
- Comprehensive income group comparisons
- Development trajectory visualization
-
Timeline Visualization
- Key technological milestones
- Impact analysis of global events (e.g., COVID-19)
- Detailed growth rate analysis
Interactive Data Filters
The dashboard features comprehensive filtering options for data exploration:
-
Geographic Filters
- Detailed country-specific analysis
- Regional grouping options
- Customizable area selection
-
Economic Filters
- World Bank income classification filters
- GDP-based grouping options
- Development status indicators
-
Growth Pattern Filters
- High-growth region identification
- Stable market analysis
- Developing area insights
-
Time Period Selection
- Granular year-by-year analysis
- Customizable time range selection
- Milestone-based period analysis
Installation Guide
# Clone the repository git clone https://github.com/jpcurada/global-internet-patterns.git # Navigate to the project directory cd global-internet-patterns # Install required packages pip install -r requirements.txt # Launch the Streamlit application streamlit run src/app.py
Project Structure
βββ global-internet-patterns/ βββ README.md # Project documentation βββ LICENSE # MIT License βββ requirements.txt # Package dependencies βββ src/ βββ app.py # Main Streamlit application βββ utils.py # Utility functions βββ visuals.py # Visualization functions βββ data/ βββ internet_gdp_data.csv βββ processed_data/
Application Preview
IV. Data Preparation
Data Sources
1. Internet Usage (internet_usage.csv)
internet_usage.csv)| Column name | Description |
|---|---|
| Country Name | Name of the country |
| Country Code | Countries 3 character country code |
| 2000 | Contains the % of population of individuals using the internet in 2000 |
| 2001 | Contains the % of population of individuals using the internet in 2001 |
| 2002 | Contains the % of population of individuals using the internet in 2002 |
| 2003 | Contains the % of population of individuals using the internet in 2003 |
| .... | ... |
| 2023 | Contains the % of population of individuals using the internet in 2023 |
Source: DataCamp
2. GDP and Country Classification Data
The following data are downloaded from World Bank.
2.1. GDP Per Capita (gdp_per_capita.csv)
| Column name | Description |
|---|---|
| Country Name | Name of the country |
| Country Code | Countries 3 character country code |
| Indicator Name | "GDP per capita (current US$)" |
| Indicator Code | "NY.GDP.PCAP.CD" |
| 1961 | GDP per capita in current US$ for 1961 |
| 1962 | GDP per capita in current US$ for 1962 |
| ... | ... |
| 2024 | GDP per capita in current US$ for 2024 |
2.2. Country Metadata (country_metadata.csv)
| Column name | Description |
|---|---|
| Country Code | Countries 3 character country code |
| Region | Geographical region classification |
| IncomeGroup | Economic classification by income level |
| SpecialNotes | Additional country-specific information |
GDP per capita is gross domestic product divided by midyear population. GDP is the sum of gross value added by all resident producers in the economy plus any product taxes and minus any subsidies not included in the value of the products. It is calculated without making deductions for depreciation of fabricated assets or for depletion and degradation of natural resources. Data are in current U.S. dollars.
Last Updated: 1/28/2025
Source: World Bank national accounts data, and OECD National Accounts data files.
import pandas as pd
import numpy as np
import missingno as msno
import plotly.express as px
import plotly.graph_objects as go
from sklearn.impute import KNNImputer
from sklearn.linear_model import LinearRegressionInternet Usage Data
raw_data = pd.read_csv("data/internet_usage.csv")
raw_data.head(10)# We will do this to ensure that all non-numeric values will be null
columns = [str(year) for year in range(2000, 2024)]
for column in columns:
raw_data[column] = pd.to_numeric(raw_data[column], errors='coerce')
Data Exploration
raw_data.info()The dataset is currently in a wide format with 217 entries and 26 columns. Each row represents a country, with years (2000-2023) spread across columns.
- Column Naming Convention
- Column names contain spaces ("Country Name")
- Inconsistent formatting across columns
- Need standardization to lowercase with underscores
- Data Types
- All columns are of type 'object', including year columns
- Year columns should be numeric for analysis
- The presence of 'object' type suggests potential non-numeric values or special characters
- Missing Values
- The 'Non-Null Count' shows all columns as complete (217 non-null)
- However, this is misleading as special characters or placeholders might be masking true missing values
- Requires conversion of non-numeric values to NULL for proper missing value detection
- Data Format
- Currently in wide format, making time-series analysis difficult
- Should be transformed to long format with variables:
- country_name
- country_code
- year
- internet_usage
The data requires reshaping from wide to long format using melt operation to:
- Create a single 'year' column
- Create a single 'internet_usage' value column
- Maintain country identifiers as key columns
- Convert data types appropriately after melting
These quality issues justify our subsequent data cleaning and transformation steps to prepare the data for analysis.
msno.matrix(raw_data)β
β