Interactive Course

Building Recommendation Engines with PySpark

Learn tools and techniques to leverage your own big data to facilitate positive experiences for your users.

  • 4 hours
  • 15 Videos
  • 56 Exercises
  • 3,986 Participants
  • 4,550 XP

Loved by learners at thousands of top companies:

forrester-grey.svg
deloitte-grey.svg
ea-grey.svg
3m-grey.svg
axa-grey.svg
siemens-grey.svg

Course Description

This course will show you how to build recommendation engines using Alternating Least Squares in PySpark. Using the popular MovieLens dataset and the Million Songs dataset, this course will take you step by step through the intuition of the Alternating Least Squares algorithm as well as the code to train, test and implement ALS models on various types of customer data.

  1. 1

    Recommendations Are Everywhere

    Free

    This chapter will show you how powerful recommendations engines can be, and provide important distinctions between collaborative-filtering engines and content-based engines as well as the different types of implicit and explicit data that recommendation engines can use. You will also learn a very powerful way to uncover hidden features (latent features) that you may not even know exist in customer datasets.

  2. Recommending Movies

    In this chapter you will be introduced to the MovieLens dataset. You will walk through how to assess it's use for ALS, build out a full cross-validated ALS model on it, and learn how to evaluate it's performance. This will be the foundation for all subsequent ALS models you build using Pyspark.

  3. How does ALS work?

    In this chapter you will review basic concepts of matrix multiplication and matrix factorization, and dive into how the Alternating Least Squares algorithm works and what arguments and hyperparameters it uses to return the best recommendations possible. You will also learn important techniques for properly preparing your data for ALS in Spark.

  4. What if you don't have customer ratings?

    In most real-life situations, you won't not have "perfect" customer data available to build an ALS model. This chapter will teach you how to use your customer behavior data to "infer" customer ratings and use those inferred ratings to build an ALS recommendation engine. Using the Million Songs Dataset as well as another version of the MovieLens dataset, this chapter will show you how to use the data available to you to build a recommendation engine using ALS and evaluate it's performance.

  1. 1

    Recommendations Are Everywhere

    Free

    This chapter will show you how powerful recommendations engines can be, and provide important distinctions between collaborative-filtering engines and content-based engines as well as the different types of implicit and explicit data that recommendation engines can use. You will also learn a very powerful way to uncover hidden features (latent features) that you may not even know exist in customer datasets.

  2. How does ALS work?

    In this chapter you will review basic concepts of matrix multiplication and matrix factorization, and dive into how the Alternating Least Squares algorithm works and what arguments and hyperparameters it uses to return the best recommendations possible. You will also learn important techniques for properly preparing your data for ALS in Spark.

  3. Recommending Movies

    In this chapter you will be introduced to the MovieLens dataset. You will walk through how to assess it's use for ALS, build out a full cross-validated ALS model on it, and learn how to evaluate it's performance. This will be the foundation for all subsequent ALS models you build using Pyspark.

  4. What if you don't have customer ratings?

    In most real-life situations, you won't not have "perfect" customer data available to build an ALS model. This chapter will teach you how to use your customer behavior data to "infer" customer ratings and use those inferred ratings to build an ALS recommendation engine. Using the Million Songs Dataset as well as another version of the MovieLens dataset, this chapter will show you how to use the data available to you to build a recommendation engine using ALS and evaluate it's performance.

What do other learners have to say?

Devon

“I've used other sites, but DataCamp's been the one that I've stuck with.”

Devon Edwards Joseph

Lloyd's Banking Group

Louis

“DataCamp is the top resource I recommend for learning data science.”

Louis Maiden

Harvard Business School

Ronbowers

“DataCamp is by far my favorite website to learn from.”

Ronald Bowers

Decision Science Analytics @ USAA

Jamen Long
Jamen Long

Data Scientist

Jamen is a data scientist with experience building machine learning models to predict and guide customers’ physical and digital shopping journeys. Having started his data science journey with DataCamp years ago, Jamen enjoys continuing to learn new applications of algorithms and data science frameworks.

See More
Collaborators
  • Lore Dirick

    Lore Dirick

  • Nick Solomon

    Nick Solomon

  • Adrián Soto

    Adrián Soto

Icon Icon Icon professional info
Do you have 5 minutes to help us improve our navigation?
I'll do it No thanks