Intermediate Python
Run the hidden code cell below to import the data used in this course.
1 hidden cell
Take Notes
Add notes about the concepts you've learned and code cells with code you want to keep.
Add your notes here
Explore Datasets
Use the DataFrames imported in the first cell to explore the data and practice your skills!
- Create a loop that iterates through the
bricsDataFrame and prints "The population of {country} is {population} million!". - Create a histogram of the life expectancies for countries in Africa in the
gapminderDataFrame. Make sure your plot has a title, axis labels, and has an appropriate number of bins. - Simulate 10 rolls of two six-sided dice. If the two dice add up to 7 or 11, print "A win!". If the two dice add up to 2, 3, or 12, print "A loss!". If the two dice add up to any other number, print "Roll again!".
pandas:
the country and capital are strings, for example. Your datasets will typically comprise different data types, so we need a tool that's better suited for the job. To easily and efficiently handle this data, there's the Pandas package. Pandas is a high level data manipulation tool developed by Wes McKinney, built on the NumPy package. Compared to NumPy, it's more high level, making it very interesting for data scientists all over the world. In pandas, we store the tabular data like the brics table here in an object called a DataFrame. Pandas is an open source library, providing high-performance, easy-to-use data structures and data analysis tools for Python
import pandas as pd
import matplotlib.pyplot as plt
# Load your Excel file
df = pd.read_csv('datasets/User Experience and System Usability Evaluation Survey (Responses) - Form Responses 1 (1).csv')
sus_scores = df['SUS Score'] # Update column name to match the actual column name in the dataset
# Create a histogram of SUS scores
plt.hist(sus_scores, bins=10, alpha=0.7, color='blue', edgecolor='black')
plt.title('Histogram of SUS Scores')
plt.xlabel('SUS Score')
plt.ylabel('Frequency')
plt.show()Recap:
- **Square brackets:
- limited functionality Ideally 2D NumPy arrays
- my_array[rows, columns]****
- **pandas
- loc (label-based)
- iloc (integer position-based)**
-_ Square brackets Column access brics[["country", "capital"]]
- Row access: only through slicing brics[1:4]
- loc (label-based)Row access brics.loc[["RU", "IN", "CH"]]
- Column access brics.loc[:, ["country", "capital"]]
- Row & Column accessbrics.loc[["RU", "IN", "CH"], ["country", "capital"]]_