Skip to content

Course Notes

Use this workspace to take notes, store code snippets, and build your own interactive cheatsheet!

# Import any packages you want to use here
spark.read.jdbc("jdbc:postgresql://localhost:5432/pagila",
                "customer",
                {"user":"repl","password":"password"})

Take Notes

Add notes here about the concepts you've learned and code cells with code you want to keep.

Joining the film and ratings tables to create a new column that stores the average rating per customer.

# Add your code snippets here
# Use groupBy and mean to aggregate the column
ratings_per_film_df = rating_df.groupBy('film_id').mean('rating')

# Join the tables using the film_id column
film_df_with_ratings = film_df.join(
    ratings_per_film_df,
    film_df.film_id==ratings_per_film_df.film_id
)

# Show the 5 first results
print(film_df_with_ratings.show(5))