Skip to content
Introduction to Data Visualization with Seaborn
Introduction to Data Visualization with Seaborn
Run the hidden code cell below to import the data used in this course.
# Importing the course packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Importing the course datasets
country_data = pd.read_csv('datasets/countries-of-the-world.csv', decimal=",")
mpg = pd.read_csv('datasets/mpg.csv')
student_data = pd.read_csv('datasets/student-alcohol-consumption.csv', index_col=0)
survey = pd.read_csv('datasets/young-people-survey-responses.csv', index_col=0)
Take Notes
Add notes about the concepts you've learned and code cells with code you want to keep.
Add your notes here
# Add your code snippets here
# Create scatter plot of horsepower vs. mpg
sns.relplot(x="horsepower", y="mpg",
data=mpg, kind="scatter",
size="cylinders",hue='cylinders')
# Show plot
plt.show()
# Create a scatter plot of acceleration vs. mpg
sns.relplot(x='acceleration',y='mpg',data=mpg,kind='scatter',style='origin',hue='origin')
# Show plot
plt.show()
# Create line plot of model year vs. horsepower
sns.relplot(x='model_year',y='horsepower',data=mpg,kind='line',ci=None
)
# Show plot
plt.show()
Explore Datasets
Use the DataFrames imported in the first cell to explore the data and practice your skills!
- From
country_data
, create a scatter plot to look at the relationship between GDP and Literacy. Use color to segment the data points by region. - Use
mpg
to create a line plot withmodel_year
on the x-axis andweight
on the y-axis. Create differentiating lines for each country of origin (origin
). - Create a box plot from
student_data
to explore the relationship between the number of failures (failures
) and the average final grade (G3
). - Create a bar plot from
survey
to compare howLoneliness
differs across values forInternet usage
. Format it to have two subplots for gender. - Make sure to add titles and labels to your plots and adjust their format for readability!
#chapter 3
#count plot
# Separate into column subplots based on age category
sns.catplot(y="Internet usage", data=survey_data,
kind="count",
col='Age Category')
# Show plot
plt.show()
#bar plot
# List of categories from lowest to highest
category_order = ["<2 hours",
"2 to 5 hours",
"5 to 10 hours",
">10 hours"]
# Turn off the confidence intervals
sns.catplot(x="study_time", y="G3",
data=student_data,
kind="bar",
order=category_order,
ci=None)
# Show plot
plt.show()
#box plot
sns.catplot(x="romantic", y="G3",
data=student_data,
kind="box",
whis=[0, 100])
# Show plot
plt.show()
#point plot
# Import median function from numpy
from numpy import median
# Plot the median number of absences instead of the mean
sns.catplot(x="romantic", y="absences",
data=student_data,
kind="point",
hue="school",
ci=None,
estimator=median)
# Show plot
plt.show()
#Chapter 4
# Create point plot
sns.catplot(x="origin",
y="acceleration",
data=mpg,
kind="point",
join=False,
capsize=0.1)
# Rotate x-tick labels
plt.xticks(rotation=90)
# Show plot
plt.show()
# Set the figure style to "dark"
sns.set_style('dark')
# Adjust to add subplots per gender
g = sns.catplot(x="Village - town", y="Likes Techno",
data=survey_data, kind="bar",
col='Gender')
# Add title and axis labels
g.fig.suptitle("Percentage of Young People Who Like Techno", y=1.02)
g.set(xlabel="Location of Residence",
ylabel="% Who Like Techno")
# Show plot
plt.show()