Skip to content

Data-Driven Decision Making in SQL

Here you can access every table used in the course. To access each table, you will need to specify the movies schema in your queries (e.g., movies.movies for the movies table, and movies.customers for the customers table).

Spinner
DataFrameas
df
variable
-- check the tables in the table_schema='movies'
SELECT table_name, table_schema, table_type
FROM information_schema.tables
where table_schema = 'movies'
ORDER BY table_name ASC;

Screen the tables and the columns

Spinner
DataFrameas
actors
variable
-- explore the tables
SELECT *
FROM movies.actors
LIMIT 5
Spinner
DataFrameas
df
variable
select * 
from movies.movies
limit 5
Spinner
DataFrameas
df
variable
select *
from movies.renting 
limit 5
Spinner
DataFrameas
df
variable
select *
from movies.customers
limit 5
Spinner
DataFrameas
df
variable
select *
from movies.actsin
limit 5

Explore the renting table to see the number of renting records, number of distinct customer, and number of ratings

Spinner
DataFrameas
df
variable
--renting table is the main table which store the renting data for the store
select 
count(*) as no_renting
, count(distinct customer_id) as unique_customer
, count(rating) as no_rating
from movies.renting

Average rating per movie

For each movie the average rating, the number of ratings and the number of views has to be reported
Join movies table for the movie title

Spinner
DataFrameas
df
variable
select
m.title
, avg(rating) as avg_rating
, count(rating) as no_rating
, count(*) as no_renting
from movies.renting as r
left join movies.movies as m
ON r.movie_id = m.movie_id
group by m.title
having  avg(rating) is not null   -- after grouping by title, remove the movies which do not have any avg_rating
order by avg_rating DESC          --check the highest rating movie

Average rating per customer

We will report these summary statistics only for customers with more than 7 movie rentals and order them in ascending order by the average rating.

Spinner
DataFrameas
df
variable
select 
c.name
, avg(r.rating) as avg_rating
, count(r.rating) as no_rating
, count(*) as no_renting
from movies.renting as r
left join movies.customers as c
ON r.customer_id = c.customer_id
group by c.name
having count(*) >7
order by no_renting DESC
Current Type: Bar
Current X-axis: no_renting
Current Y-axis: name
Current Color: None

Top renting movies