Skip to content
Course Notes: Introduction to Data Engineering
Course Notes
Use this workspace to take notes, store code snippets, and build your own interactive cheatsheet!
# Import any packages you want to use here
spark.read.jdbc("jdbc:postgresql://localhost:5432/pagila",
"customer",
{"user":"repl","password":"password"})Take Notes
Add notes here about the concepts you've learned and code cells with code you want to keep.
Joining the film and ratings tables to create a new column that stores the average rating per customer.
# Add your code snippets here
# Use groupBy and mean to aggregate the column
ratings_per_film_df = rating_df.groupBy('film_id').mean('rating')
# Join the tables using the film_id column
film_df_with_ratings = film_df.join(
ratings_per_film_df,
film_df.film_id==ratings_per_film_df.film_id
)
# Show the 5 first results
print(film_df_with_ratings.show(5))