Intermediate Data Visualization with Seaborn
Run the hidden code cell below to import the data used in this course.
Take Notes
Add notes about the concepts you've learned and code cells with code you want to keep.
Add your notes here
# Add your code snippets here
Explore Datasets
Use the DataFrames imported in the first cell to explore the data and practice your skills!
- Use
lmplot()
to look at the relationship betweentemp
andtotal_rentals
frombike_share
. Plot two regression lines for working and non-working days (workingday
). - Create a heat map from
daily_show
to see how the types of guests (Group
) have changed yearly. - Explore the variables from
insurance
and their relationship by creating pairwise plots and experimenting with different variables and types of plots. Additionally, you can use color to segment visually for region. - Make sure to add titles and labels to your plots and adjust their format for readability!
Rug plot and kde shading Now that you understand some function arguments for displot(), we can continue further refining the output. This process of creating a visualization and updating it in an incremental fashion is a useful and common approach to look at data from multiple perspectives.
Seaborn excels at making this process simple.
Instructions 100 XP Create a displot of the Award_Amount column in the df. Configure it to show a shaded kde plot (using the kind and fill parameters). Add a rug plot above the x axis (using the rug parameter). Display the plot.
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('datasets/schoolimprovement2010grants.csv')
# Create a displot of the Award Amount
sns.displot(df['Award_Amount'],
kind='kde',
rug=True,
fill=True)
# Plot the results
plt.show()
Create a regression plot For this set of exercises, we will be looking at FiveThirtyEight's data on which US State has the worst drivers. The data set includes summary level information about fatal accidents as well as insurance premiums for each state as of 2010.
In this exercise, we will look at the difference between the regression plotting functions.
Instructions 1/2 50 XP 1 2 The data is available in the dataframe called df. Create a regression plot using regplot() with "insurance_losses" on the x axis and "premiums" on the y axis.
df = pd.read_csv('datasets/insurance_premiums.csv')
# Create a regression plot of premiums vs. insurance_losses
sns.regplot(x='insurance_losses', y='premiums', data=df)
# Display the plot
plt.show()
# Create an lmplot of premiums vs. insurance_losses
sns.lmplot(x='insurance_losses', y='premiums', data=df)
# Display the second plot
plt.show()
# Create a regression plot using hue
sns.lmplot(x='insurance_losses', y='premiums', data=df,
hue="Region")
# Show the results
plt.show()
# Create a regression plot with multiple rows
sns.lmplot(data=df,
x="insurance_losses",
y="premiums",
row="Region")
# Show the plot
plt.show()
Create and display a palette with 10 colors using the husl system.
sns.palplot(sns.color_palette('husl', 10))
plt.show()