Home Machine LearningBuilding Recommendation Engines with PySpark

Building Recommendation Engines with PySpark

Learn tools and techniques to leverage your own big data to facilitate positive experiences for your users.

Start Course for Free

4 Hours15 Videos56 Exercises

11,581 LearnersStatement of Accomplishment

Create Your Free Account

Google LinkedIn Facebook

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Training 2 or more people?Try DataCamp For Business

Loved by learners at thousands of companies

Course Description

This course will show you how to build recommendation engines using Alternating Least Squares in PySpark. Using the popular MovieLens dataset and the Million Songs dataset, this course will take you step by step through the intuition of the Alternating Least Squares algorithm as well as the code to train, test and implement ALS models on various types of customer data.

For Business

Training 2 or more people?

Get your team access to the full DataCamp library, with centralized reporting, assignments, projects and more

1
Recommendations Are Everywhere
Free
This chapter will show you how powerful recommendations engines can be, and provide important distinctions between collaborative-filtering engines and content-based engines as well as the different types of implicit and explicit data that recommendation engines can use. You will also learn a very powerful way to uncover hidden features (latent features) that you may not even know exist in customer datasets.
Play Chapter Now
Why learn how to build recommendation engines?
50 xp
See the power of a recommendation engine
100 xp
Power of recommendation engines
50 xp
Recommendation engine types and data types
50 xp
Collaborative vs content-based filtering
50 xp
Collaborative vs content based filtering part II
50 xp
Implicit vs explicit data
100 xp
Ratings data types
100 xp
Uses for recommendation engines
50 xp
Alternate uses of recommendation engines.
50 xp
Confirm understanding of latent features
100 xp
2
How does ALS work?
In this chapter you will review basic concepts of matrix multiplication and matrix factorization, and dive into how the Alternating Least Squares algorithm works and what arguments and hyperparameters it uses to return the best recommendations possible. You will also learn important techniques for properly preparing your data for ALS in Spark.
Play Chapter Now
Overview of matrix multiplication
50 xp
Matrix multiplication
100 xp
Matrix multiplication part II
100 xp
Overview of matrix factorization
50 xp
Matrix factorization
100 xp
Non-negative matrix factorization
100 xp
How ALS alternates to generate predictions
50 xp
Estimating recommendations
100 xp
RMSE as ALS alternates
100 xp
Data preparation for Spark ALS
50 xp
Correct format and distinct users
100 xp
Assigning integer id's to movies
100 xp
ALS parameters and hyperparameters
50 xp
Build out an ALS model
100 xp
Build RMSE evaluator
100 xp
Get RMSE
100 xp
3
Recommending Movies
In this chapter you will be introduced to the MovieLens dataset. You will walk through how to assess it's use for ALS, build out a full cross-validated ALS model on it, and learn how to evaluate it's performance. This will be the foundation for all subsequent ALS models you build using Pyspark.
Play Chapter Now
Introduction to the MovieLens dataset
50 xp
Viewing the MovieLens Data
100 xp
Calculate sparsity
100 xp
The GroupBy and Filter methods
100 xp
MovieLens Summary Statistics
100 xp
View Schema
100 xp
ALS model buildout on MovieLens Data
50 xp
Create test/train splits and build your ALS model
100 xp
Tell Spark how to tune your ALS model
100 xp
Build your cross validation pipeline
100 xp
Best Model and Best Model Parameters
100 xp
Model Performance Evaluation
50 xp
Generate predictions and calculate RMSE
100 xp
Interpreting the RMSE
50 xp
Do recommendations make sense
100 xp
4
What if you don't have customer ratings?
In most real-life situations, you won't not have "perfect" customer data available to build an ALS model. This chapter will teach you how to use your customer behavior data to "infer" customer ratings and use those inferred ratings to build an ALS recommendation engine. Using the Million Songs Dataset as well as another version of the MovieLens dataset, this chapter will show you how to use the data available to you to build a recommendation engine using ALS and evaluate it's performance.
Play Chapter Now
Introduction to the Million Songs Dataset
50 xp
Confirm understanding of implicit rating concepts
50 xp
MSD summary statistics
100 xp
Grouped summary statistics
100 xp
Add zeros
100 xp
Evaluating implicit ratings models
50 xp
Specify ALS hyperparameters
100 xp
Build implicit models
100 xp
Running a cross-validated implicit ALS model
100 xp
Extracting parameters
100 xp
Overview of binary, implicit ratings
50 xp
Binary model performance
100 xp
Recommendations from binary data
100 xp
Course recap
50 xp

In the following tracks

Big Data with PySpark

Collaborators

Lore Dirick

Nick Solomon

Adrián Soto

Prerequisites

Introduction to PySpark Supervised Learning with scikit-learn

Jamen Long

Data Scientist

What do other learners have to say?

Join over 13 million learners and start Building Recommendation Engines with PySpark today!

Create Your Free Account

Google LinkedIn Facebook

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Course Description

.css-1goj2uy{margin-right:8px;}Group.css-gnv7tt{font-size:20px;font-weight:700;white-space:nowrap;}.css-12nwtlk{box-sizing:border-box;margin:0;min-width:0;color:#05192D;font-size:16px;line-height:1.5;font-size:20px;font-weight:700;white-space:nowrap;}Training 2 or more people?

Recommendations Are Everywhere

How does ALS work?

Recommending Movies

What if you don't have customer ratings?

What do other learners have to say?

Join over .css-ou6dz6{color:#03ef62;}13 million learners and start Building Recommendation Engines with PySpark today!

Create Your Free Account

Training 2 or more people?

Join over 13 million learners and start Building Recommendation Engines with PySpark today!