In this course you will learn how to predict future events using linear regression, generalized additive models, random forests, and xgboost.

Start Course for Free4 Hours19 Videos65 Exercises28,107 Learners

5300 XPor

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA. You confirm you are at least 16 years old (13 if you are an authorized Classrooms user).From a machine learning perspective, regression is the task of predicting numerical outcomes from various inputs. In this course, you'll learn about different regression models, how to train these models in R, how to evaluate the models you train and use them to make predictions.

- 1
### What is Regression?

**Free**In this chapter we introduce the concept of regression from a machine learning point of view. We will present the fundamental regression method: linear regression. We will show how to fit a linear regression model and to make predictions from the model.

Welcome and Introduction50 xpIdentify the regression tasks50 xpLinear regression - the fundamental method50 xpCode a simple one-variable regression100 xpExamining a model100 xpPredicting once you fit a model50 xpPredicting from the unemployment model100 xpMultivariate linear regression (Part 1)100 xpMultivariate linear regression (Part 2)100 xpWrapping up linear regression50 xp - 2
### Training and Evaluating Regression Models

Now that we have learned how to fit basic linear regression models, we will learn how to evaluate how well our models perform. We will review evaluating a model graphically, and look at two basic metrics for regression models. We will also learn how to train a model that will perform well in the wild, not just on training data. Although we will demonstrate these techniques using linear regression, all these concepts apply to models fit with any regression algorithm.

Evaluating a model graphically50 xpGraphically evaluate the unemployment model100 xpThe gain curve to evaluate the unemployment model100 xpRoot Mean Squared Error (RMSE)50 xpCalculate RMSE100 xpR-Squared50 xpCalculate R-Squared100 xpCorrelation and R-squared100 xpProperly Training a Model50 xpGenerating a random test/train split100 xpTrain a model using test/train split100 xpEvaluate a model using test/train split100 xpCreate a cross validation plan100 xpEvaluate a modeling procedure using n-fold cross-validation100 xp - 3
### Issues to Consider

Before moving on to more sophisticated regression techniques, we will look at some other modeling issues: modeling with categorical inputs, interactions between variables, and when you might consider transforming inputs and outputs before modeling. While more sophisticated regression techniques manage some of these issues automatically, it's important to be aware of them, in order to understand which methods best handle various issues -- and which issues you must still manage yourself.

Categorical inputs50 xpExamining the structure of categorical inputs100 xpModeling with categorical inputs100 xpInteractions50 xpModeling an interaction100 xpModeling an interaction (2)100 xpTransforming the response before modeling50 xpRelative error100 xpModeling log-transformed monetary output100 xpComparing RMSE and root-mean-squared Relative Error100 xpTransforming inputs before modeling50 xpInput transforms: the "hockey stick"100 xpInput transforms: the "hockey stick" (2)100 xp - 4
### Dealing with Non-Linear Responses

Now that we have mastered linear models, we will begin to look at techniques for modeling situations that don't meet the assumptions of linearity. This includes predicting probabilities and frequencies (values bounded between 0 and 1); predicting counts (nonnegative integer values, and associated rates); and responses that have a non-linear but additive relationship to the inputs. These algorithms are variations on the standard linear model.

Logistic regression to predict probabilities50 xpFit a model of sparrow survival probability100 xpPredict sparrow survival100 xpPoisson and quasipoisson regression to predict counts50 xpPoisson or quasipoisson50 xpFit a model to predict bike rental counts100 xpPredict bike rentals on new data100 xpVisualize the bike rental predictions100 xpGAM to learn non-linear transforms50 xpWriting formulas for GAM models50 xpWriting formulas for GAM models (2)50 xpModel soybean growth with GAM100 xpPredict with the soybean model on test data100 xp - 5
### Tree-Based Methods

In this chapter we will look at modeling algorithms that do not assume linearity or additivity, and that can learn limited types of interactions among input variables. These algorithms are *tree-based* methods that work by combining ensembles of *decision trees* that are learned from the training data.

The intuition behind tree-based methods50 xpPredicting with a decision tree50 xpRandom forests50 xpBuild a random forest model for bike rentals100 xpPredict bike rentals with the random forest model100 xpVisualize random forest bike model predictions100 xpOne-Hot-Encoding Categorical Variables50 xpvtreat on a small example100 xpNovel levels100 xpvtreat the bike rental data100 xpGradient boosting machines50 xpFind the right number of trees for a gradient boosting machine100 xpFit an xgboost bike rental model and predict100 xpEvaluate the xgboost bike rental model100 xpVisualize the xgboost bike rental model100 xp

Prerequisites

Introduction to Regression in RCo-founder, Principal Consultant at Win-Vector, LLC

John is a co-founder and principal consultant at Win-Vector LLC, a San Francisco data science consultancy. He is the author of several R packages, including the data treatment package vtreat. John is co-author of Practical Data Science with R and blogs at the Win-Vector Blog about data science and R programming. His interests include data science, statistics, R programming, and theoretical computer science.

Co-founder, Principal Consultant at Win-Vector, LLC

Nina is a co-founder and principal consultant at Win-Vector LLC, a San Francisco data science consultancy. She is co-author of the popular text Practical Data Science with R and occasionally blogs at the Win-Vector Blog on data science and R. Her technical interests include data science, statistics, statistical learning, and data visualization.

“I've used other sites—Coursera, Udacity, things like that—but DataCamp's been the one that I've stuck with.”

Devon Edwards Joseph

Lloyds Banking Group

“DataCamp is the top resource I recommend for learning data science.”

Louis Maiden

Harvard Business School

“DataCamp is by far my favorite website to learn from.”

Ronald Bowers

Decision Science Analytics, USAA

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA. You confirm you are at least 16 years old (13 if you are an authorized Classrooms user).