# Start coding here...
t
Introduction to Seaborn
What is Seaborn
Python Data visualization library Easily create the most common types of plots
Why is Seaborn useful?
Huge component in data exploration and communication of results
Advantages of Seaborn
Easy to use Works well with pandas data structures Built on top of Matplotlib
Import Seaborn
import seaborn as sns import matplotlib.pyplot as plt (necessary)
sns.scatterplot(x=, y=, data=df) sns.countplot(x=)
Using pandas with Seaborn
What is pandas?
Python library for data analysis Easily read datasets from csv,text,and other types of files Datasets take the form of DataFrame objects
Using DataFrames with countplot()
import pandas as pd import seaborn as sns import matplotlib.pyplot as plt
df= pd.read_csv('csvfile.csv') sns.countplot(x='column_name', data=df)
Seaborn works well with 'Tidy Data' wich means that each observation has its own row and each variable has its own column
Adding a third variable with hue
A scatter plot with hue
tips= sns.load_dataset('dataset') sns.scatterplot(x='column_name', y='column_name', hue='column_name', hue_order=['value', 'value'(set the order of the values in the plot accordingly)])
Specifying hue colors
hue_colors={'value':'color',....} sns.scatterplot(x='column_name', y='column_name', hue='column_name', hue_order=['value', 'value'], palette=hue_colors)
Using hue with countplots
sns.countplot(x='column', data=df,hue='column_name')
Introduction to relational plots and subplots
Introducing relplot()
Stands for relational plol and enables you to visualize the relationship between two quantitative variables using either scatterplots or lineplots Lets you create subplots in a single figure sns.relplot(x='column_name', y='column_name', data=df, kind='scatter or line', col='column_name'('visualization in row of the plots')or row='column_name'('visualization in column of the plots')(They can be together),col_wrap=int(allows to accomodate the visualization), col_order=['argument_passed_in_col',''] (List to define))
Customizing scatter plots
Subgroups with point size and hue
sns.relplot(x='column_name', y='column_name', data=df, kind=, size='size', hue='size')
Subgroups with point style
sns.relplot(x='column_name', y='column_name', data=df, kind=, hue='column_name', style='column_name')
Changing point transparency
sns.relplot(x='column_name', y='column_name', data=df, kind=, alpha=int)
Introduction to lineplots
What are lineplots
The visualization of choice when we need to track the same thing over time sns.relplot(x='column_name', y='column_name', data=df, kind='line', hue='column_name', style='column_name', markers=True('Displays the marker for each data point'), dashes=False(all the lines with the same style)) If a line plot is given multiple observations per x-value, it will aggregate them into a single summary measure. By default, it will display the mean with a shaded region across the line called confidence interval which indicates the uncertainty in our estimate
Replacing confidence interval with std deviation
sns.relplot(x='column_name', y='column_name', data=df, kind='line', ci='sd'(This shows the spread of distribution of observations at each x-value))
Turning off confidence interval
sns.relplot(x='column_name', y='column_name', data=df, kind='line', ci=None)
Count plots and bar plots
Categorical plots
Examples: count plots, bar plots Involve a categorical variable wich consists of a fixed, typically small number of possible values or categories Comparison between groups
catplot()
Used to create categorical plots Same advantages of relplot() Easily create subplots with col= and row= sns.catplot(x='colum_name', data=df, kind='count')
Changing the order
First we create a list with the order that we want and then we use 'order' parameter list=[] sns.catplot(x='colum_name', data=df, kind='count', order=list)
Bar plots
Displays the mean of a quatitative variable among observations in each category sns.catplot(x='column_name', y='column_name', data=df, kind='bar')
Confidence intervals
Lines show 95% confidence intervals for the mean Shows uncertainty about our estimate Assumes our data is a random sample sns.catplot(x='column_name', y='column_name', data=df, kind='bar', ci=None)
Changing the orientation
Switch the x and y parameters Common practice tells to put the categorical variable on the x-axis
Box plots
What is a box plot
Shows the distribution of quantitative data The colored box represents the 25th to 75th percentile, and the line in the middle represents the median The whiskers give a sense of the spread of the distribution and the floating points represents outliers Commonly used as a way to compare the distribution of a quantitative variable across different groups of a categorical variable
How to create a box plot
sns.catplot(x='column_name', y='column_name', data=df, kind='box', order[list])
Omitting the outliers using sym
sns.catplot(x='column_name', y='column_name', data=df, kind='box', sym='')
Changing the whisker using whis
By default the whiskers extend to 1.5 the interquartile range whis=int to set an increment whis=[5, 95] shows the percentiles that we want 5th and 95th whis=[0, 100] shows the min and max values
Point plots
What are point plots?
Shows the mean of a quatitative variable for the observations in each category, plotted as as single point Vertical lines show 95% confidence intervals
Point plots vs line plots
Both show: Mean of a quantitative variable 95% confidence intervals Differences: Line plot has quantitative variable on x-axis Point plot has categorical variable on x-axis
Creating a point plot
sns.catplot(x='column_name', y='column_name', data=df, kind='point', hue='column_name')
Disconnecting ther points
sns.catplot(x='column_name', y='column_name', data=df, kind='point', hue='column_name', join=False)
Display the median
from mumpy import median sns.catplot(x='column_name', y='column_name', data=df, kind='point', estimator=median) We can prefere the median over the mean because the first is more robust to outliers, so if the data has a lot of outliers the median may be a better statistic to use
Customizing the confidence intervals
sns.catplot(x='column_name', y='column_name', data=df, kind='point', capsize=0.2)
Changing plot style and color
Why customize?
Personal preference Improve readability Guide interpretation
Changing the figure style
Figure style includes background and axes Preset options: white,dark,whitegrid,darkgrid,ticks For a global style we use sns.set_style() Default is white To change the style, before we generate the plot we use sns.set_style()
Changing the palette
Figure 'palette' changes the color of the main elements of the plot sns.set_palette() Use preset palettes or create a custom palette
Diverging palettes
Seaborn has a group of preset palettes called diverging palettes that are great to use if your visualization deals with a scale where the two ends of the scale are opposites and there is a neutral midpoint Palettes: 'RdBu', 'PRGn' and we can add ('RdBu_r') to reverse the palette Like before, we use the command before generating the plot
Sequential palettes
Single color or two colors blended moving from light to dark values Great for emphasizing a variable on a continuous scale
Custom palettes
We can create our own palette by generating a list with colors and passing it as argument for the function
Changing the scale
Figure 'context' changes the scale of the plot elements and labels sns.set_context() Scale options from smallest to largest: 'paper','notebook','talk','poster' Default is paper Use the command before generating the plot
Adding titles and labels: Part 1
FacetGrid vs AxesSubplot objects
Seaborn plots create two different types of objects: FacetGrid and AxesSubplot To figure out which type of object you're working with, first assing the plot output to a variable commonly named 'g' type(g)
And Empty FacetGrid
Consists of one or more AxesSubplot, which is how it supports subplots relplot() and catplot() creates FacetGrid scatterplot(), countplot(),etc creates AxesSuplots
Adding a title to FacetGrid
First assign the plot to the variable "g" g.fig.suptitle('Name of the plot', y=int)
Adding titles and labels: Part 2
Adding a title to AxesSubplot
Assign the plot to the variable "g" g.set_title('Name of the plot', y=int)
Titles for subplots
In several subplots the function g.fig.suptitle() gives the name to the figure as a whole To give one name for each one we use g.ste_titles('This is{'in here we can set the column name that we use in the argume col= '}')
Adding axis labels
Asing the plot to the variable 'g' g.set(xlabel='Name', ylabel='Name')
Rotating x-axis tick labels
Assign the plot to the variable 'g' plt.xticks(rotation=90)
Putting it all together
Getting started
import seaborn as sns import matplotlib.pyplot as plt To show: plt.show()
Relational plots
Show the relationship between to quantitative variables Examples: scatter plots,lineplots sns.relplot(x='column_name', y='column_name', data=df, kind=)
Categorical plots
Show the distribution of a quantitative variable within categories defined by a categorical variable Examples: bar plots, count plots, box plots, point plots sns.catplot(x='column_name', y='column_name', data=df, kind=)
Adding a third variable (hue)
Setting hue will create subgroups that are displayed as different colors on a single plot
Adding a third variable (row/col)
Setting row and/or col inr relplot() or catplot() will create subgroups that are displayed on separate subplots
Customization
Change the background: sns.set_style() Change the main element colors: sns.set_palette() Change the scale: sns.set_context()
Adding a title
FacetGrid: g.fig.suptitle() AxesSubPlot: g.set_title()
Final touches
Add x- and y-axis labels: g.set(xlabel='name', ylabel='name') Rotate x-tick labels: plt.xticks(rotation=int)