Skip to content
0

Analyzing global internet patterns

๐Ÿ“– Background

In this competition, you'll be exploring a dataset that highlights internet usage for different countries from 2000 to 2023. Your goal is import, clean, analyze and visualize the data in your preferred tool.

The end goal will be a clean, self explanatory, and interactive visualization. By conducting a thorough analysis, you'll dive deeper into how internet usage has changed over time and the countries still widely impacted by lack of internet availability.

๐Ÿ’พ Data

You have access to the following file, but you can supplement your data with other sources to enrich your analysis.

Interet Usage (internet_usage.csv)

Column nameDescription
Country NameName of the country
Country CodeCountries 3 character country code
2000Contains the % of population of individuals using the internet in 2000
2001Contains the % of population of individuals using the internet in 2001
2002Contains the % of population of individuals using the internet in 2002
2003Contains the % of population of individuals using the internet in 2003
.......
2023Contains the % of population of individuals using the internet in 2023

The data can be downloaded from the Files section (File > Show workbook files).

๐Ÿ’ช Challenge

Use a tool of your choice to create an interesting visual or dashboard that summarizes your analysis!

Things to consider:

  1. Use this Workspace to prepare your data (optional).
  2. Stuck on where to start, here's some ideas to get you started:
    • Visualize interner usage over time, by country
    • How has internet usage changed over time, are there any patterns emerging?
    • Consider bringing in other data to supplement your analysis
  3. Create a screenshot of your main dashboard / visuals, and paste in the designated field.
  4. Summarize your findings in an executive summary.
import pandas as pd
data = pd.read_csv("data/internet_usage.csv") 
data.head(10)

โœ๏ธ Judging criteria

CATEGORYWEIGHTINGDETAILS
Visualizations50%
  • Appropriateness of visualizations used.
  • Clarity of insight from visualizations.
Summary35%
  • Clarity of insights - how clear and well presented the findings are.
Votes15%
  • Up voting - most upvoted entries get the most points.

โŒ›๏ธ Time is ticking. Good luck!

1. Data Preparation

Ensure the data is cleaned (as shown in the previous code) and ready for analysis.

import pandas as pd

# Loading of the dataset
df = pd.read_csv("data/internet_usage.csv")

# Displaying the first few rows of the dataset
print("Original Data:")
print(df.head())

# Replacing '..' with NaN for proper handling of missing values
df.replace('..', pd.NA, inplace=True)

# Converting the year columns to numeric, forcing errors to NaN
year_columns = df.columns[2:]  # All columns from 2000 onwards
df[year_columns] = df[year_columns].apply(pd.to_numeric, errors='coerce')

# Checking for missing values
missing_values = df.isnull().sum()


# Forward filling 
df.fillna(method='ffill', inplace=True)  # Forward fill

# Displaying cleaned data
print("\nCleaned Data:")
print(df.head())

# Saving the cleaned data to a new CSV file
df.to_csv('cleaned_internet_usage.csv', index=False)

2. Exploratory Data Analysis (EDA)

Descriptive Statistics: overview of internet usage by calculating mean, median, and standard deviation for each year.

print(df[year_columns].describe())

Distribution of Internet Usage:

Visualize the distribution of internet usage for specific years using histograms.

import pandas as pd
import plotly.express as px

# Load your dataset
data = pd.read_csv('cleaned_internet_usage.csv')

# Melt the dataframe to long format for Plotly
melted_data = data.melt(id_vars=['Country Name'], var_name='Year', value_name='Internet Usage')

# Create the animated map
fig = px.choropleth(
    melted_data,
    locations='Country Name',
    locationmode='country names',
    color='Internet Usage',
    hover_name='Country Name',
    animation_frame='Year',
    color_continuous_scale=px.colors.sequential.Plasma,
    title='Global Internet Usage Over Time'
)

# Show the figure
fig.show()

Summary Analysis of the Animated Map of Global Internet Usage Over Time

The animated map created using Plotly's px.choropleth function provides a dynamic visualization of global internet usage from the year 2000 to 2023. Here are some key observations and insights from the chart:

Key Observations:

  1. Global Trends:

    • There is a clear upward trend in internet usage across the globe over the years.
    • The animation shows a gradual increase in the percentage of internet users in most countries, reflecting the global expansion of internet access.
  2. Regional Differences:

    • Developed regions such as North America, Europe, and parts of Asia show high internet usage early on and continue to lead in internet penetration rates.
    • Developing regions, including parts of Africa and South Asia, show significant growth over the years but still lag behind developed regions in terms of overall internet usage.
  3. Country-Specific Insights:

    • Countries like the United States, Canada, and most of Western Europe have high internet usage rates throughout the period.
    • Emerging economies such as China and India show rapid growth in internet usage, especially in the later years of the dataset.
    • Some countries exhibit slower growth, which could be due to various factors such as economic challenges, infrastructure limitations, or political issues.
  4. Color Scale Interpretation:

    • The color scale used (Plasma) effectively highlights the differences in internet usage, with lighter colors indicating higher usage and darker colors indicating lower usage.
    • This visual differentiation helps in quickly identifying regions with high and low internet penetration.

Insights:

  • Digital Divide: The map highlights the digital divide between different regions and countries. While some countries have achieved near-universal internet access, others are still in the early stages of internet adoption.
  • Policy Implications: The data can be used by policymakers to identify regions that need more investment in internet infrastructure and digital literacy programs.
  • Economic Impact: Increased internet usage is often correlated with economic development. Countries with higher internet penetration rates tend to have better access to information, education, and economic opportunities.

Conclusion:

The animated map provides a comprehensive overview of the changes in global internet usage over the past two decades. It underscores the progress made in expanding internet access worldwide while also highlighting the ongoing challenges in bridging the digital divide. This visualization serves as a valuable tool for understanding the global landscape of internet usage and can inform future efforts to promote digital inclusion.

import matplotlib.pyplot as plt

df[year_columns].hist(bins=20, figsize=(15, 10))
plt.title('Distribution of Internet Usage by Year')
plt.xlabel('Internet Usage (%)')
plt.ylabel('Frequency')
plt.show()

Summary Analysis of Internet Usage Distribution by Year

The histogram chart above provides a visual representation of the distribution of internet usage percentages across different years. Here are some key insights derived from the chart:

  1. Distribution Shape:

    • The histograms for earlier years (e.g., 2000-2005) tend to show a left-skewed distribution, indicating that a majority of countries had low internet usage percentages during these years.
    • As the years progress, the distribution shifts towards the right, becoming more normally distributed. This suggests an increase in internet adoption globally over time.
  2. Central Tendency:

    • The mean and median of internet usage percentages increase steadily from 2000 to 2023. This indicates a general upward trend in internet penetration worldwide.
  3. Spread and Variability:

    • The spread of the data (range and standard deviation) appears to decrease over time. In the early 2000s, there was a wide range of internet usage percentages, reflecting significant disparities between countries. By 2023, the range has narrowed, suggesting more uniform internet usage across countries.
  4. Outliers:

    • In the early years, there are noticeable outliers with very high internet usage percentages, likely representing highly developed countries with early internet adoption.
    • Over time, the number of outliers decreases, indicating that more countries are catching up in terms of internet usage.
  5. Frequency:

    • The frequency of countries with higher internet usage percentages increases over the years. By 2023, a significant number of countries have high internet usage rates, reflecting widespread internet access.

Conclusion

The histogram analysis reveals a clear trend of increasing internet usage globally from 2000 to 2023. The data shows a shift from low and varied internet usage in the early 2000s to higher and more uniform usage in recent years. This trend highlights the global progress in internet accessibility and the narrowing digital divide between countries.

โ€Œ
โ€Œ
โ€Œ