Introduction to Data Visualization with ggplot2
Run the hidden code cell below to import the data used in this course.
1 hidden cell
Take Notes
Add notes about the concepts you've learned and code cells with code you want to keep.
graph layer elements can be stored as objects for further modifications.
ggplot -> calls dataset
aes -> usually part of ggplot() but can also be specified in geom_^() when mixing data for visualization. Specifies x and y, and other arguments as fill, color, size, labels and shape which can be fixed or attributed depending on a third variable. Operations between variables can also be specified. Alpha for transparecy.
geom_^ -> specifies the geometry, can add arguments for aesthetics and visualization. It can use position as an argument. Here jitter with pre-specified characteristics can be used to add random noise consistently.
labs(x=,y=) -> specifies axis labels.
geom_smooth -> adds trend lines
theme -> modifies non-data characteristics of the graph
AESTHETICS MAPS VARIABLES INTO THE GRAPH, WHILE ATTRIBUTES ONLY MODIFY VISUAL CHARACTERISTICS.
Add your notes here
1 hidden cell
About scatter plots
Considerations on overplotting when:
- Large datasets
- Aligned values on a single axis
- Low-precision data
- Integer data
Using position arguments can solve this
1 hidden cell
About histograms and barplots.
Histograms: Specifying position and transparecy can help to visualize counts on different categories. Barplots: can also take positions to overlap variables
ggplot(mtcars, aes(mpg, fill = fam)) +
# Change the position to identity, with transparency 0.4
geom_histogram(binwidth = 1, position = "identity", alpha=0.4)
#position can change to dodge, or to fill if only proportions matter to us
#about barplots
ggplot(mtcars, aes(cyl, fill = fam)) +
# Set the transparency to 0.6
geom_bar(position = position_dodge(width = 0.2), alpha=0.6)
About lines, helpful in counts through time
# Plot multiple time-series by coloring by species
ggplot(fish.tidy, aes(x = Year, y = Capture, color = Species)) +
geom_line()
Theme Layer
It modifies non-data elements:
- text
- rectangle
- line
setting element_blank just errases it.
1 hidden cell
Another approach should be followed if the graph is intended for a broad audience as an explanatory visualization.
Exercise from 2007 data on life expectancies among countries.