Building a Recommender System in R

Welcome to this code-along, where we will build a recommender system to recommend movies to users! Through this, you'll learn how to prepare your data, explore it and create the recommender system itself using recommenderLab. There will be time to answer any questions, so please add them!

recommenderLab is an R package that provides a framework to test and develop recommender algorithms. Various algorithms are supported, including User-based collaborative filtering (UBCF), Item-based collaborative filtering (IBCF), Association rule-based recommender (AR), and many more. It was developed in 2016 by Michael Hahsler. Details can be found via this link.

Load packages

library(aws.s3)
library(tidyverse)
library(qdapTools)
library(recommenderlab)

The dataset

Acknowledgements

The datasets are collected by MovieLens, a research site run by GroupLens Research at the University of Minnesota. MovieLens uses "collaborative filtering" technology to make recommendations of movies that you might enjoy and to help you avoid the ones that you won't.

There are several datasets for different purposes. For this demo, we used the full dataset from this webpage, containing user ratings and tags from 62,000 movies. The dataset was updated in 2018.

The dataset for this demo is split into three csv files: movies, ratings, and tags. All three can be joined by the movieId key.

Data Dictionary

movies

variable	class	description
movieId	numeric	The unique id of the movie
title	character	The title of the movie
genres	character	The genres the movie can be categorized in

ratings

Table with movies and their average rating. Movies that received less than two ratings were removed.

variable	class	description
movieId	numeric	The unique id of the movie
avg_rating	numeric	Average rating received

tags

variable	class	description
userId	numeric	A unique identifier for the user that gave the rating
movieId	numeric	The unique id of the movie
tag	numeric	The tag that was given to the movie
timestamp	numeric	The timestamp on which the user gave the rating

Load your Data

# ratings = s3read_using(FUN = read.csv, bucket = "datacamp-workspacedemo-workspacedemos3-prod", object = "lca-rec-sys/ratings_by_movie.csv")
# movies = s3read_using(FUN = read.csv, bucket = "datacamp-workspacedemo-workspacedemos3-prod", object = "lca-rec-sys/movies.csv")
# tags = s3read_using(FUN = read.csv, bucket = "datacamp-workspacedemo-workspacedemos3-prod", object = "lca-rec-sys/tags.csv")

Data preprocessing

Create two tables:

Matrix for recommender model (only numeric values)
Cleaned, fully joined table with all data for final output

1. Split genres to one genre per column per movie, only keep numeric values

2. Add average rating to movies, filter out movies without rating

3. Prepare dataset for recommender engine as matrix

4. Retrieve full list of genres as a vector

5. Retrieve top 15 of movie tags to filter out rarely used tags

‌
‌
‌

Building a Recommender System in R

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Building a Recommender System in R

Load packages

The dataset

Acknowledgements

Data Dictionary

Load your Data

Data preprocessing

1. Split genres to one genre per column per movie, only keep numeric values

2. Add average rating to movies, filter out movies without rating

3. Prepare dataset for recommender engine as matrix

4. Retrieve full list of genres as a vector

5. Retrieve top 15 of movie tags to filter out rarely used tags

Building a Recommender System in R