Intermediate Python
Notes And Code By Kyesswa Steven
Run the hidden code cell below to import the data used in this course.
# Import the course packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Import the two datasets
gapminder = pd.read_csv("datasets/gapminder.csv")
brics = pd.read_csv("datasets/brics.csv")
Code by Kyesswa Steven
Dictionaries In Python
Are a built-in data types that can be used to store data in key-value pairs e.g { one : 1 }
Importances And Usecases Of Dictionaries
Efficient Data Retrieval: They use a concept called hashing which allows the value associated with a key to be fetched quickly, regardless of how large or big the dictionary is.
Data Organization: They can be used to represent structured data in a way that’s easy to understand and manipulate. For example, you could use a dictionary to represent a continent, with keys for country, language, currency, etc.
Counting Objects: They are great for counting occurrences of items. For example, you could iterate over a list of words in a text file, using the dictionary to keep track of how many times each object appears.
Grouping Objects: You can use dictionaries to group objects by some field or property. For example, given a list of students taking different courses, you could create a dictionary to group them according to the course they are doing.
Caching: They can be used to speed up code by caching results of function calls.
Graph Representation: They are often used to represent graphs, where the keys are the nodes of the graph and the values are lists (or sets) of nodes that are connected to the key node.
Add your notes here
# Snippet Code by Kyesswa Steven
#
#
#
# Dictionary Example in Code
dict_compass = {
'Alfred': 'Business Administration',
'Steve' : 'Computer Science',
'Sash' : 'Mechanical Engineering',
'Rita' : 'Nursing',
'Kara' : 'Political Science',
'Mark' : 'Biology',
'Abby' : 'Psychology',
'Ethan' : 'Environmental Science',
'Samantha' : 'English Literature',
'Joanna' : 'Art History'
}
# Print the objects of the dictionary
for key, value in dict_compass.items():
print(f"{key}: {value}")
Explore Datasets
Use the DataFrames imported in the first cell to explore the data and practice your skills!
- Create a loop that iterates through the
brics
DataFrame and prints "The population of {country} is {population} million!". - Create a histogram of the life expectancies for countries in Africa in the
gapminder
DataFrame. Make sure your plot has a title, axis labels, and has an appropriate number of bins. - Simulate 10 rolls of two six-sided dice. If the two dice add up to 7 or 11, print "A win!". If the two dice add up to 2, 3, or 12, print "A loss!". If the two dice add up to any other number, print "Roll again!".
# Code by Kyesswa Steven
#
#
#
# Iterating Through brics DataFrame and printing the population of the country in millions
#
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
brics = pd.DataFrame({
'country': ['Uganda', 'Rwanda', 'Zamunda', 'Wadhiya', 'Wakanda'],
'population': [48.58, 14.09, 5.5, 3.7, 1.7] # population in millions
})
for index, row in brics.iterrows():
print(f"The population of {row['country']} is {row['population']} million!")
# Code by Kyesswa Steven
#
#
#
# Histogram of the life expectancies for countries in Africa in the gapminder DataFrame
# Long live the kingdom of Zamunda
# Also Wakanda Forever
# :)
import pandas as pd
import matplotlib.pyplot as plt
# 'gapminder' DataFrame with real and fictional countries
gapminder = pd.DataFrame({
'continent': ['Africa', 'Africa', 'Africa', 'Africa', 'Africa', 'Africa', 'Africa', 'Africa', 'Africa', 'Africa', 'Africa', 'Africa', 'Africa', 'Africa', 'Africa', 'Africa'],
'country': ['Uganda', 'Rwanda', 'Zamunda', 'Wadhiya', 'Wakanda', 'Kenya', 'Botswana', 'Mozambique', 'Burundi', 'Gabon', 'Liberia', 'Nigeria', 'Ghana', 'Cameroon', 'Madagascar', 'South Africa'],
'lifeExp': [63.37, 68.72, 75.50, 70.30, 100.00, 66.1, 69.86, 59.3, 62, 65.82, 64.1, 55.12, 66.3, 60.32, 66.6, 64.13]
})
# Filter the DataFrame for African countries
africa = gapminder[gapminder['continent'] == 'Africa']
# Create the histogram
plt.hist(africa['lifeExp'], bins=10, edgecolor='black')
# Add title and labels
plt.title('Life Expectancy in African Countries')
plt.xlabel('Life Expectancy')
plt.ylabel('Number of Countries')
# Show the plot
plt.show()
# Code by Kyesswa Steven
#
#
#
# Simulating 10 rolls of two six-sided dice. If the two dice add up to 7 or 11, print "A win!". If the two dice add up to 2, 3, or # 12, print "A loss!". If the two dice add up to any other number, print "Roll again!"
#
#
#
import numpy as np
# Simulating 10 rolls of two six-sided dice
for _ in range(10):
dice = np.random.randint(1, 7, size=2)
total = dice.sum()
# Check the sum of the two dice
if total in [7, 11]:
print("A win!")
elif total in [2, 3, 12]:
print("A loss!")
else:
print("Roll again!")
Dictionary, DataFrame Notes
Pandas is an open-source library, providing high-performance, easy-to-use data structures and data analysis tools for Python.
- The DataFrame is one of Pandas' most important data structures and stores tabular data where you can label the rows and the columns.
- A dictionary is One way to build a DataFrame.
In the exercises that follow you, will be working with vehicle data from different countries. Each observation corresponds to a country and the columns give information about the number of vehicles per capita, whether people drive left or right, and so on.
Three lists are defined in the script:
names, containing the country names for which data is available. dr, a list with booleans that tells whether people drive left or right in the corresponding country. cpc, the number of motor vehicles per 1000 people in the corresponding country.
Each dictionary key is a column label and each value is a list that contains the column elements.
# Pre-defined lists
names = ['United States', 'Australia', 'Japan', 'India', 'Russia', 'Morocco', 'Egypt']
dr = [True, False, False, False, True, True, True]
cpc = [809, 731, 588, 18, 200, 70, 45]
# Import pandas as pd
import pandas as pd
# Create dictionary my_dict with three key:value pairs: my_dict
my_dict = { 'country' : names,
'drives_right' : dr,
'cars_per_cap' : cpc
}
# Build a DataFrame cars from my_dict: cars
cars = pd.DataFrame(my_dict)
# Print cars
print(cars)
CSV to DataFrame(1)
CSV is short for "comma-separated values"
Putting data in a dictionary and then building a DataFrame works, but it's not very efficient.
How Does It Work
-To import CSV data into Python as a Pandas DataFrame you can use read_csv().
Let's explore this function with the same cars data from the previous exercises. This time, however, the data is available in a CSV file, named cars.csv. It is available in your current working directory, so the path to the file is simply 'cars.csv'.
Intructions
To import CSV files you still need the pandas package: import it as pd.
Use pd.read_csv() to import cars.csv data as a DataFrame. Store this DataFrame as cars.
Print out cars. Does everything look OK?
# Import pandas as pd
import pandas as pd
# Import the cars.csv data: cars
cars = pd.read_csv("datasets/cars.csv")
# Print out cars
print(cars)
SV to DataFrame (2)
Your read_csv() call to import the CSV data didn't generate an error, but the output is not entirely what we wanted. The row labels were imported as another column without a name.
Remember index_col, an argument of read_csv(), that you can use to specify which column in the CSV file should be used as a row label? Well, that's exactly what you need here!
Python code that solves the previous exercise is already included; can you make the appropriate changes to fix the data import?
Instructions
Run the code with Run Code and assert that the first column should actually be used as row labels. Specify the index_col argument inside pd.read_csv(): set it to 0, so that the first column is used as row labels. Has the printout of cars improved now?
# Import pandas as pd
import pandas as pd
# Fix import by including index_col
cars = pd.read_csv('datasets/cars.csv', index_col = 0)
# Print out cars
print(cars)