Skip to content
0

πŸ’ͺ Challenge

Create a report to answer the principal's questions. Include:

  1. What are the top 5 countries with the highest internet use (by population share)?
  2. How many people had internet access in those countries in 2019?
  3. What are the top 5 countries with the highest internet use for each of the following regions: 'Africa Eastern and Southern', 'Africa Western and Central', 'Latin America & Caribbean', 'East Asia & Pacific', 'South Asia', 'North America', 'European Union'?
  4. Create a visualization for those five regions' internet usage over time.
  5. What are the 5 countries with the most internet users?
  6. What is the correlation between internet usage (population share) and broadband subscriptions for 2019?
  7. Summarize your findings.

πŸ§‘β€βš–οΈ Judging criteria

CATEGORYWEIGHTINGDETAILS
Response quality85%
  • Accuracy (30%) - The response must be representative of the original data and free from errors.
  • Clarity (25%) - The response must be easy to understand and clearly expressed.
  • Completeness (30%) - The response must be a full report that responds to the question posed.
Presentation15%
  • How legible/understandable the response is.
  • How well-formatted the response is.
  • Spelling and grammar.

In the event of a tie, earlier submission time will be used as a tie-breaker.

πŸ“˜ Rules

To be eligible to win, you must:

  • Submit your response to this problem before the deadline.

All responses must be submitted in English.

Entrants must be:

  • 18+ years old.
  • Allowed to take part in a skill-based competition from their country.

Entrants can not:

  • Be in a country currently sanctioned by the U.S. government.

XP will be awarded at the end of the competition. Therefore competition XP will not count towards any daily prizes.

βœ… Checklist before submitting your workspace

  • Rename your workspace to make it descriptive of your work. N.B., you should leave the notebook name as notebook.ipynb.
  • Remove redundant cells like the introduction to data science notebooks, so the workbook is focused on your story.
  • Check that all the cells run without error.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import plotly.express as px

people = pd.read_csv('data/people.csv')
broadband = pd.read_csv('data/broadband.csv')
internet = pd.read_csv('data/internet.csv')

1. top 5 countries with the highest internet use (by population share)

[24]
top_five = internet.query('Year == 2019')
top_five = top_five.groupby('Entity')['Internet_Usage'].sum()
top_five = top_five.sort_values(ascending=False).head(5)
top_five

The top 5 countries with the highest internet use in 2019 (by population share) are: Bahrain, Qatar, Kuwait, United Arab Emirates, Denmark

2. How many people had internet access in those countries in 2019

[25]
countries_users_2019 = people[people['Entity'].isin(top_five.index)]
countries_users_2019 = countries_users_2019 [(countries_users_2019 ['Year'] == 2019)].sort_values(ascending=False, by ='Users' )
countries_users_2019 = countries_users_2019[['Year', 'Entity', 'Users' ]]
countries_users_2019

People with internet access in 2019 in:

United Arab Emirates 9 133 361,

Denmark 5 682 653,

Kuwait 4 420 795,

Qatar 2 797 495,

Bahrain 1 489 735

3. top 5 countries with the highest internet use for each of the regions

new = internet.dropna(axis=0, subset=('Code', ))
new = new.query ('Entity != "World"')

df1 = pd.read_html('https://statisticstimes.com/geography/countries-by-continents.php', thousands=None, decimal=',')
may = df1[2]
may.rename (columns= {'ISO-alpha3 Code':'Code'}, inplace = True )

countries = new.merge(may, how = "left")
Africa Eastern and Southern
[27]
aes = 'Eastern Africa', 'Southern Africa'
aes1 = countries[countries['Region 1'].isin(aes)]
aes1 = aes1.query('Year == 2017')

top_five_aes = aes1.groupby('Entity')['Internet_Usage'].sum().sort_values(ascending=False).head(5)
top_five_aes
β€Œ
β€Œ
β€Œ