Home RModeling with Data in the Tidyverse

Modeling with Data in the Tidyverse

Discover different types in data modeling, including for prediction, and learn how to conduct linear regression and model assessment measures in the Tidyverse.

Start Course for Free

4 Hours17 Videos49 Exercises

22,412 LearnersStatement of Accomplishment

Create Your Free Account

Google LinkedIn Facebook

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Training 2 or more people?Try DataCamp For Business

Loved by learners at thousands of companies

Course Description

In this course, you will learn to model with data. Models attempt to capture the relationship between an outcome variable of interest and a series of explanatory/predictor variables. Such models can be used for both explanatory purposes, e.g. "Does knowing professors' ages help explain their teaching evaluation scores?", and predictive purposes, e.g., "How well can we predict a house's price based on its size and condition?" You will leverage your tidyverse skills to construct and interpret such models. This course centers around the use of linear regression, one of the most commonly-used and easy to understand approaches to modeling. Such modeling and thinking is used in a wide variety of fields, including statistics, causal inference, machine learning, and artificial intelligence.

For Business

Training 2 or more people?

Get your team access to the full DataCamp library, with centralized reporting, assignments, projects and more

1
Introduction to Modeling
Free
This chapter will introduce you to some background theory and terminology for modeling, in particular, the general modeling framework, the difference between modeling for explanation and modeling for prediction, and the modeling problem. Furthermore, you'll start performing your first exploratory data analysis, a crucial first step before any formal modeling.
Play Chapter Now
Background on modeling for explanation
50 xp
Exploratory visualization of age
100 xp
Numerical summaries of age
100 xp
Background on modeling for prediction
50 xp
Exploratory visualization of house size
100 xp
Log10 transformation of house size
100 xp
The modeling problem for explanation
50 xp
EDA of relationship of teaching & "beauty" scores
100 xp
Correlation between teaching and "beauty" scores
100 xp
The modeling problem for prediction
50 xp
EDA of relationship of house price and waterfront
100 xp
Predicting house price with waterfront
100 xp
2
Modeling with Basic Regression
Equipped with your understanding of the general modeling framework, in this chapter, we'll cover basic linear regression where you'll keep things simple and model the outcome variable y as a function of a single explanatory/ predictor variable x. We'll use both numerical and categorical x variables. The outcome variable of interest in this chapter will be teaching evaluation scores of instructors at the University of Texas, Austin.
Play Chapter Now
Explaining teaching score with age
50 xp
Plotting a "best-fitting" regression line
100 xp
Fitting a regression with a numerical x
100 xp
Predicting teaching score using age
50 xp
Making predictions using "beauty score"
100 xp
Computing fitted/predicted values & residuals
100 xp
Explaining teaching score with gender
50 xp
EDA of relationship of score and rank
100 xp
Fitting a regression with a categorical x
100 xp
Predicting teaching score using gender
50 xp
Making predictions using rank
50 xp
Visualizing the distribution of residuals
100 xp
3
Modeling with Multiple Regression
In the previous chapter, you learned about basic regression using either a single numerical or a categorical predictor. But why limit ourselves to using only one variable to inform your explanations/predictions? You will now extend basic regression to multiple regression, which allows for incorporation of more than one explanatory or one predictor variable in your models. You'll be modeling house prices using a dataset of houses in the Seattle, WA metropolitan area.
Play Chapter Now
Explaining house price with year & size
50 xp
EDA of relationship
100 xp
Fitting a regression
100 xp
Predicting house price using year & size
50 xp
Making predictions using size and bedrooms
100 xp
Interpreting residuals
100 xp
Explaining house price with size & condition
50 xp
Parallel slopes model
100 xp
Interpreting the parallel slopes model
50 xp
Predicting house price using size & condition
50 xp
Making predictions using size and waterfront
100 xp
Automating predictions on "new" houses
100 xp
4
Model Assessment and Selection
In the previous chapters, you fit various models to explain or predict an outcome variable of interest. However, how do we know which models to choose? Model assessment measures allow you to assess how well an explanatory model "fits" a set of data or how accurate a predictive model is. Based on these measures, you'll learn about criteria for determining which models are "best".
Play Chapter Now
Model selection and assessment
50 xp
Refresher: sum of squared residuals
100 xp
Which model to select?
50 xp
Assessing model fit with R-squared
50 xp
Computing the R-squared of a model
100 xp
Comparing the R-squared of two models
100 xp
Assessing predictions with RMSE
50 xp
Computing the MSE & RMSE of a model
100 xp
Comparing the RMSE of two models
100 xp
Validation set prediction framework
50 xp
Fitting model to training data
100 xp
Predicting on test data
100 xp
Conclusion - Where to go from here?
50 xp

In the following tracks

Tidyverse Fundamentals with R

Collaborators

Chester Ismay

Sumedh Panchadhar

Benjamin Feder

Prerequisites

Data Manipulation with dplyr

Albert Y. Kim

Associate Professor of Statistical & Data Sciences, Smith College

What do other learners have to say?

Join over 13 million learners and start Modeling with Data in the Tidyverse today!

Create Your Free Account

Google LinkedIn Facebook

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Course Description

.css-1goj2uy{margin-right:8px;}Group.css-gnv7tt{font-size:20px;font-weight:700;white-space:nowrap;}.css-12nwtlk{box-sizing:border-box;margin:0;min-width:0;color:#05192D;font-size:16px;line-height:1.5;font-size:20px;font-weight:700;white-space:nowrap;}Training 2 or more people?

Introduction to Modeling

Modeling with Basic Regression

Modeling with Multiple Regression

Model Assessment and Selection

What do other learners have to say?

Join over .css-ou6dz6{color:#03ef62;}13 million learners and start Modeling with Data in the Tidyverse today!

Create Your Free Account

Training 2 or more people?

Join over 13 million learners and start Modeling with Data in the Tidyverse today!