Skip to content
# Start coding here... 
 
t

Introduction to Seaborn

What is Seaborn

Python Data visualization library Easily create the most common types of plots

Why is Seaborn useful?

Huge component in data exploration and communication of results

Advantages of Seaborn

Easy to use Works well with pandas data structures Built on top of Matplotlib

Import Seaborn

import seaborn as sns import matplotlib.pyplot as plt (necessary)

sns.scatterplot(x=, y=, data=df) sns.countplot(x=)

Using pandas with Seaborn

What is pandas?

Python library for data analysis Easily read datasets from csv,text,and other types of files Datasets take the form of DataFrame objects

Using DataFrames with countplot()

import pandas as pd import seaborn as sns import matplotlib.pyplot as plt

df= pd.read_csv('csvfile.csv') sns.countplot(x='column_name', data=df)

Seaborn works well with 'Tidy Data' wich means that each observation has its own row and each variable has its own column

Adding a third variable with hue

A scatter plot with hue

tips= sns.load_dataset('dataset') sns.scatterplot(x='column_name', y='column_name', hue='column_name', hue_order=['value', 'value'(set the order of the values in the plot accordingly)])

Specifying hue colors

hue_colors={'value':'color',....} sns.scatterplot(x='column_name', y='column_name', hue='column_name', hue_order=['value', 'value'], palette=hue_colors)

Using hue with countplots

sns.countplot(x='column', data=df,hue='column_name')

Introduction to relational plots and subplots

Introducing relplot()

Stands for relational plol and enables you to visualize the relationship between two quantitative variables using either scatterplots or lineplots Lets you create subplots in a single figure sns.relplot(x='column_name', y='column_name', data=df, kind='scatter or line', col='column_name'('visualization in row of the plots')or row='column_name'('visualization in column of the plots')(They can be together),col_wrap=int(allows to accomodate the visualization), col_order=['argument_passed_in_col',''] (List to define))

Customizing scatter plots

Subgroups with point size and hue

sns.relplot(x='column_name', y='column_name', data=df, kind=, size='size', hue='size')

Subgroups with point style

sns.relplot(x='column_name', y='column_name', data=df, kind=, hue='column_name', style='column_name')

Changing point transparency

sns.relplot(x='column_name', y='column_name', data=df, kind=, alpha=int)

Introduction to lineplots

What are lineplots

The visualization of choice when we need to track the same thing over time sns.relplot(x='column_name', y='column_name', data=df, kind='line', hue='column_name', style='column_name', markers=True('Displays the marker for each data point'), dashes=False(all the lines with the same style)) If a line plot is given multiple observations per x-value, it will aggregate them into a single summary measure. By default, it will display the mean with a shaded region across the line called confidence interval which indicates the uncertainty in our estimate

Replacing confidence interval with std deviation

sns.relplot(x='column_name', y='column_name', data=df, kind='line', ci='sd'(This shows the spread of distribution of observations at each x-value))

Turning off confidence interval

sns.relplot(x='column_name', y='column_name', data=df, kind='line', ci=None)

Count plots and bar plots

Categorical plots

Examples: count plots, bar plots Involve a categorical variable wich consists of a fixed, typically small number of possible values or categories Comparison between groups

catplot()

Used to create categorical plots Same advantages of relplot() Easily create subplots with col= and row= sns.catplot(x='colum_name', data=df, kind='count')

Changing the order

First we create a list with the order that we want and then we use 'order' parameter list=[] sns.catplot(x='colum_name', data=df, kind='count', order=list)

Bar plots

Displays the mean of a quatitative variable among observations in each category sns.catplot(x='column_name', y='column_name', data=df, kind='bar')

Confidence intervals

Lines show 95% confidence intervals for the mean Shows uncertainty about our estimate Assumes our data is a random sample sns.catplot(x='column_name', y='column_name', data=df, kind='bar', ci=None)

Changing the orientation

Switch the x and y parameters Common practice tells to put the categorical variable on the x-axis

Box plots

What is a box plot

Shows the distribution of quantitative data The colored box represents the 25th to 75th percentile, and the line in the middle represents the median The whiskers give a sense of the spread of the distribution and the floating points represents outliers Commonly used as a way to compare the distribution of a quantitative variable across different groups of a categorical variable

How to create a box plot

sns.catplot(x='column_name', y='column_name', data=df, kind='box', order[list])

Omitting the outliers using sym

sns.catplot(x='column_name', y='column_name', data=df, kind='box', sym='')

Changing the whisker using whis

By default the whiskers extend to 1.5 the interquartile range whis=int to set an increment whis=[5, 95] shows the percentiles that we want 5th and 95th whis=[0, 100] shows the min and max values

Point plots

What are point plots?

Shows the mean of a quatitative variable for the observations in each category, plotted as as single point Vertical lines show 95% confidence intervals

Point plots vs line plots

Both show: Mean of a quantitative variable 95% confidence intervals Differences: Line plot has quantitative variable on x-axis Point plot has categorical variable on x-axis

Creating a point plot

sns.catplot(x='column_name', y='column_name', data=df, kind='point', hue='column_name')

Disconnecting ther points

sns.catplot(x='column_name', y='column_name', data=df, kind='point', hue='column_name', join=False)

Display the median

from mumpy import median sns.catplot(x='column_name', y='column_name', data=df, kind='point', estimator=median) We can prefere the median over the mean because the first is more robust to outliers, so if the data has a lot of outliers the median may be a better statistic to use

Customizing the confidence intervals

sns.catplot(x='column_name', y='column_name', data=df, kind='point', capsize=0.2)

Changing plot style and color

Why customize?

Personal preference Improve readability Guide interpretation

Changing the figure style

Figure style includes background and axes Preset options: white,dark,whitegrid,darkgrid,ticks For a global style we use sns.set_style() Default is white To change the style, before we generate the plot we use sns.set_style()

Changing the palette

Figure 'palette' changes the color of the main elements of the plot sns.set_palette() Use preset palettes or create a custom palette

Diverging palettes

Seaborn has a group of preset palettes called diverging palettes that are great to use if your visualization deals with a scale where the two ends of the scale are opposites and there is a neutral midpoint Palettes: 'RdBu', 'PRGn' and we can add ('RdBu_r') to reverse the palette Like before, we use the command before generating the plot

Sequential palettes

Single color or two colors blended moving from light to dark values Great for emphasizing a variable on a continuous scale

Custom palettes

We can create our own palette by generating a list with colors and passing it as argument for the function

Changing the scale

Figure 'context' changes the scale of the plot elements and labels sns.set_context() Scale options from smallest to largest: 'paper','notebook','talk','poster' Default is paper Use the command before generating the plot

Adding titles and labels: Part 1

FacetGrid vs AxesSubplot objects

Seaborn plots create two different types of objects: FacetGrid and AxesSubplot To figure out which type of object you're working with, first assing the plot output to a variable commonly named 'g' type(g)

And Empty FacetGrid

Consists of one or more AxesSubplot, which is how it supports subplots relplot() and catplot() creates FacetGrid scatterplot(), countplot(),etc creates AxesSuplots

Adding a title to FacetGrid

First assign the plot to the variable "g" g.fig.suptitle('Name of the plot', y=int)

Adding titles and labels: Part 2

Adding a title to AxesSubplot

Assign the plot to the variable "g" g.set_title('Name of the plot', y=int)

Titles for subplots

In several subplots the function g.fig.suptitle() gives the name to the figure as a whole To give one name for each one we use g.ste_titles('This is{'in here we can set the column name that we use in the argume col= '}')

Adding axis labels

Asing the plot to the variable 'g' g.set(xlabel='Name', ylabel='Name')

Rotating x-axis tick labels

Assign the plot to the variable 'g' plt.xticks(rotation=90)

Putting it all together

Getting started

import seaborn as sns import matplotlib.pyplot as plt To show: plt.show()

Relational plots

Show the relationship between to quantitative variables Examples: scatter plots,lineplots sns.relplot(x='column_name', y='column_name', data=df, kind=)

Categorical plots

Show the distribution of a quantitative variable within categories defined by a categorical variable Examples: bar plots, count plots, box plots, point plots sns.catplot(x='column_name', y='column_name', data=df, kind=)

Adding a third variable (hue)

Setting hue will create subgroups that are displayed as different colors on a single plot

Adding a third variable (row/col)

Setting row and/or col inr relplot() or catplot() will create subgroups that are displayed on separate subplots

Customization

Change the background: sns.set_style() Change the main element colors: sns.set_palette() Change the scale: sns.set_context()

Adding a title

FacetGrid: g.fig.suptitle() AxesSubPlot: g.set_title()

Final touches

Add x- and y-axis labels: g.set(xlabel='name', ylabel='name') Rotate x-tick labels: plt.xticks(rotation=int)