Exploratory Data Analysis in R for Absolute Beginners
In this code-along, we'll learn the basics of R, the Tidyverse package, and the Lubridate package to determine which user onboarding flow the product team at DataCamp should focus on this quarter.
We will be using a synthetic dataset titled "user_page_view_history.csv" avaliable in our Workspace today.
This data set contains:
- Page view history of users.
- List of pages visited, the page that referred them, onboarding flow label, and date stamp of the visitation.
Table Of Contents
- The Foundations
- Exploring the Data
- Onboarding Flows
The Foundations of R
What is R?
R is a programming language we can use to tell computers what to do! R can be used to solve problems as complex as prototyping a dashboard to as simple as 2 + 2.
# We could use R like a calculator
2 + 2
What is a variable?
Sometimes after we have performed a calculation, we would like to save the output. We can do this by storing outputs as variables!
You can think of a variable as a box. Much like the boxes we use on a day to day basis, variables can store objects and assigned an alias for later. In R we store data objects inside variables using this pointer symbol "<-".
Here are three common data objects we like to store:
- Integers (
thenumberfour <- 4
) - Strings (
thewordhello <- "hello"
) - Results Of Calculations (
resultoftwoplustwo <- 2+2
)
# Here are three examples
resultoftwoplustwo <- 2+2
# We can view the contents of a variable by typing its name
resultoftwoplustwo
What is a function?
Similar to functions in mathematics, functions take some input and create some output. In computer science you can create custom functions, although we won't be covering that in this lesson, or use premade function avaliable to us from the R. For example, we could write a function that computes 2 + 2 or we could use the function sum(1,1).
# Here is a simple and commonly used function
x <- sum(2,2)
x
Today we will be using these twelve functions:
- library()
- read_csv()
- select()
- filter()
- mutate()
- floor_date()
- group_by()
- summarise()
- n() (Invalid URL)
- mean()
- ungroup()
- count()
Lets try using the library()
function to load the tidyverse
package.
# Load tidyverse
library(tidyverse)