Skip to content

Introduction to Data Visualization with Seaborn

Run the hidden code cell below to import the data used in this course.

# Importing the course packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Importing the course datasets
country_data = pd.read_csv('datasets/countries-of-the-world.csv', decimal=",")
mpg = pd.read_csv('datasets/mpg.csv')
student_data = pd.read_csv('datasets/student-alcohol-consumption.csv', index_col=0)
survey = pd.read_csv('datasets/young-people-survey-responses.csv', index_col=0)

Take Notes

Add notes about the concepts you've learned and code cells with code you want to keep.

Add your notes here

# Add your code snippets here

Explore Datasets

Use the DataFrames imported in the first cell to explore the data and practice your skills!

  • From country_data, create a scatter plot to look at the relationship between GDP and Literacy. Use color to segment the data points by region.
  • Use mpg to create a line plot with model_year on the x-axis and weight on the y-axis. Create differentiating lines for each country of origin (origin).
  • Create a box plot from student_data to explore the relationship between the number of failures (failures) and the average final grade (G3).
  • Create a bar plot from survey to compare how Loneliness differs across values for Internet usage. Format it to have two subplots for gender.
  • Make sure to add titles and labels to your plots and adjust their format for readability!
country_data.head()

GETTING STARTED

# Import Matplotlib and Seaborn
import matplotlib.pyplot as plt
import seaborn as sns

# Extract columns from country_data into lists
gdp = country_data['GDP ($ per capita)'].tolist()
phones = country_data['Phones (per 1000)'].tolist()

# Create scatter plot with GDP on the x-axis and number of phones on the y-axis
sns.scatterplot(x=gdp, y=phones)

# Show plot
plt.show()
# Import Matplotlib and Seaborn
import matplotlib.pyplot as plt
import seaborn as sns

# Create list of region
region = country_data['Region'].tolist()

# Create count plot with region on the y-axis
sns.countplot(y=region)

# Show plot
plt.show()

MAKING COUNTPLOT

# Import Matplotlib, pandas, and Seaborn
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd


# Create a DataFrame from csv file
df = pd.read_csv('csv_filepath')

# Create a count plot with "Spiders" on the x-axis
sns.countplot(x="Spiders", data=df)

# Display the plot
plt.show()

HUE and ADDING 3RD VARIABLE

# Import Matplotlib and Seaborn
import matplotlib.pyplot as plt
import seaborn as sns

# Change the legend order in the scatter plot
sns.scatterplot(x="absences", y="G3", 
                data=student_data, 
                hue="location", hue_order=["Rural", "Urban"])

# Show plot
plt.show()


# Import Matplotlib and Seaborn
import matplotlib.pyplot as plt
import seaborn as sns

# Create a dictionary mapping subgroup values to colors
palette_colors = {"Rural": "green", "Urban": "blue"}

# Create a count plot of school with location subgroups
sns.countplot(x="school", data=student_data, hue="location", palette=palette_colors)



# Display plot
plt.show()

RELATIONAL PLOTS - Takes the place of scatterplot in SNS due to the ability to call "scatter" or "line" when needed