Skip to content
Intermediate Python
Intermediate Python
Run the hidden code cell below to import the data used in this course.
# Import the course packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Import the two datasets
gapminder = pd.read_csv("datasets/gapminder.csv")
brics = pd.read_csv("datasets/brics.csv")Matplotlib
import matplotlib.pyplot as plt
# Basic plot = plt.plot (x,y)
# Scatter plot = plt.scatter (x,y) - when you're trying to assess if there's a correlation between two variables
year = [1950, 1970, 1990, 2010]
pop = [2.519, 3.692, 5.263, 6.972]
plt.plot(year, pop)
plt.scatter(year, pop) #What to plot / add the argument "s" to change the size, "c" to change the color, and "alpha" to change the opacity (0 totally transparent, 1 is solid color)
plt.show() #Displays the plotHistogram
A type of visualization that helps get an idea about the distribution of variables Too few bins will oversimplify reality and won't show you the details. Too many bins will overcomplicate reality and won't show the bigger picture.
import matplotlib.pyplot as plt
# Create a histogram = plt.hist(x, bins=None)
# Where x is a list of values that you want to build the histogram for
# Bins, is to tell python into how many bins the data should be devided (10 by default)
values = [0,0.6,1.4,1.6,2.2,2.5,2.6,3.2,3.5,3.9,4.2,6]
#To control the number of bins to divide your data in, you can set the bins argument.
plt.hist(values, bins=3)plt.clf() cleans it up again so you can start afresh.
Customization
For each visualization, you have many options
- Different plot types
- Many customizations (Colors, shapes, labels, axis)
The choice depends on
- The type of data
- The story you want to tell
Axis labels
for: year = [1950, 1970, 1990, 2010] pop = [2.519, 3.692, 5.263, 6.972]
X Label = plt.xlabel('Year')
Y Label = plt.ylabel('Population')
year = [1950, 1970, 1990, 2010]
pop = [2.519, 3.692, 5.263, 6.972]
#Add more data
year = [1800, 1850, 1900] + year
pop = [1.0, 1.262, 1.650] + pop
plt.plot(year,pop)
#Remember to call these functions before calling the show function.
plt.xlabel ('Year')
plt.ylabel ('Population')
plt.title ('World population')
plt.yticks([0,2,4,6,8,10]) #Get or set the current tick locations and labels of the y-axis.
#As the values are given in billions, we can also do the following:
plt.yticks([0,2,4,6,8,10],['0','2B','4B','6B','8B','10B'])
#Display the grid
plt.grid(True)
#Add text
plt.text(1800,1.0,"First year")
plt.show()Dictionaires
Using the index function is a terrible approach to keep track of long datasets and the position of one specific element
pop = [30.55, 2.77, 39.21]
countries = ["afghanistan", "albania", "algeria"]
ind_alb = countries.index("albania") #This is not convenient, nor intuitive
ind_albTo create a dictionary, use curly brackets:
world = {"afghanistan":30.55, "albania":2.77, "algeria":39.21} #This is a dictionary
world["albania"]#If you want to know the population of Albania, you simply type the string "Albania"
#In other words, you pass the key in square brackets [Key], and you get the corresponding value