Skip to content
0

💪 Challenge

Create a report to answer the principal's questions. Include:

  1. What are the top 5 countries with the highest internet use (by population share)?
  2. How many people had internet access in those countries in 2019?
  3. What are the top 5 countries with the highest internet use for each of the following regions: 'Africa Eastern and Southern', 'Africa Western and Central', 'Latin America & Caribbean', 'East Asia & Pacific', 'South Asia', 'North America', 'European Union'?
  4. Create a visualization for those five regions' internet usage over time.
  5. What are the 5 countries with the most internet users?
  6. What is the correlation between internet usage (population share) and broadband subscriptions for 2019?
  7. Summarize your findings.

🧑‍⚖️ Judging criteria

CATEGORYWEIGHTINGDETAILS
Response quality85%
  • Accuracy (30%) - The response must be representative of the original data and free from errors.
  • Clarity (25%) - The response must be easy to understand and clearly expressed.
  • Completeness (30%) - The response must be a full report that responds to the question posed.
Presentation15%
  • How legible/understandable the response is.
  • How well-formatted the response is.
  • Spelling and grammar.

In the event of a tie, earlier submission time will be used as a tie-breaker.

📘 Rules

To be eligible to win, you must:

  • Submit your response to this problem before the deadline.

All responses must be submitted in English.

Entrants must be:

  • 18+ years old.
  • Allowed to take part in a skill-based competition from their country.

Entrants can not:

  • Be in a country currently sanctioned by the U.S. government.

XP will be awarded at the end of the competition. Therefore competition XP will not count towards any daily prizes.

✅ Checklist before submitting your workspace

  • Rename your workspace to make it descriptive of your work. N.B., you should leave the notebook name as notebook.ipynb.
  • Remove redundant cells like the introduction to data science notebooks, so the workbook is focused on your story.
  • Check that all the cells run without error.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import plotly.express as px

people = pd.read_csv('data/people.csv')
broadband = pd.read_csv('data/broadband.csv')
internet = pd.read_csv('data/internet.csv')

1. top 5 countries with the highest internet use (by population share)

[24]
top_five = internet.query('Year == 2019')
top_five = top_five.groupby('Entity')['Internet_Usage'].sum()
top_five = top_five.sort_values(ascending=False).head(5)
top_five

The top 5 countries with the highest internet use in 2019 (by population share) are: Bahrain, Qatar, Kuwait, United Arab Emirates, Denmark

2. How many people had internet access in those countries in 2019

[25]
countries_users_2019 = people[people['Entity'].isin(top_five.index)]
countries_users_2019 = countries_users_2019 [(countries_users_2019 ['Year'] == 2019)].sort_values(ascending=False, by ='Users' )
countries_users_2019 = countries_users_2019[['Year', 'Entity', 'Users' ]]
countries_users_2019

People with internet access in 2019 in:

United Arab Emirates 9 133 361,

Denmark 5 682 653,

Kuwait 4 420 795,

Qatar 2 797 495,

Bahrain 1 489 735

3. top 5 countries with the highest internet use for each of the regions

new = internet.dropna(axis=0, subset=('Code', ))
new = new.query ('Entity != "World"')

df1 = pd.read_html('https://statisticstimes.com/geography/countries-by-continents.php', thousands=None, decimal=',')
may = df1[2]
may.rename (columns= {'ISO-alpha3 Code':'Code'}, inplace = True )

countries = new.merge(may, how = "left")
Africa Eastern and Southern
[27]
aes = 'Eastern Africa', 'Southern Africa'
aes1 = countries[countries['Region 1'].isin(aes)]
aes1 = aes1.query('Year == 2017')

top_five_aes = aes1.groupby('Entity')['Internet_Usage'].sum().sort_values(ascending=False).head(5)
top_five_aes