Skip to content
Data-Driven Decision Making in SQL
Data-Driven Decision Making in SQL
Here you can access every table used in the course. To access each table, you will need to specify the movies schema in your queries (e.g., movies.movies for the movies table, and movies.customers for the customers table).
Take Notes
Add notes about the concepts you've learned and SQL cells with queries you want to keep.
Data-driven decision making acheives short-term and long term goals
- we can extract valueable information from data to support operational short term decisions. ie. popularity of certain actors to decide whether to purchase certain movies.
- long-term decisions, data-driven support can provide information regarding customer growth and success in certain regions in the past which can drive decisions about when and where market can be expanded in the future.
- KPIs: Key Performance Indicators
- help the company (or its subdivision) define and monitor success.
- Success KPI may be revenue
- for subdivision customor relations management, the KPI could be quantified by average rating of all movies
- KPI customer engagement = the number of active customers in a certain perios of time
- help the company (or its subdivision) define and monitor success.
Add your notes here
DataFrameas
actors
variable
-- You are asked to provide a report about the development of the company. Specifically, your manager is interested in the total number of movie rentals, the total number of ratings and the average rating of all movies since the beginning of 2019.
SELECT
COUNT(*) AS number_renting,
AVG(rating) AS average_rating,
COUNT(rating) AS number_ratings -- Add the total number of ratings here.
FROM movies.renting
WHERE date_renting >= '2019-01-01';DataFrameas
df
variable
-- Conduct an analysis to see when the first customer accounts were created for each country.
SELECT country,
MIN(date_account_start) AS first_account
FROM customers
GROUP BY country
ORDER BY first_account;DataFrameas
df
variable
--For each movie the average rating, the number of ratings and the number of views has to be reported. Generate a table with meaningful column names.
SELECT movie_id,
AVG(rating) AS avg_rating,
COUNT(rating) AS number_ratings,
COUNT(*) AS number_renting
FROM renting
GROUP BY movie_id
ORDER BY avg_rating desc; -- Order by average rating in decreasing orderDataFrameas
df
variable
--The management of MovieNow wants to report key performance indicators (KPIs) for the performance of the company in 2018. They are interested in measuring the financial successes as well as user engagement. Important KPIs are, therefore, the profit coming from movie rentals, the number of movie rentals and the number of active customers.
SELECT
SUM(m.renting_price),
COUNT(*),
COUNT(DISTINCT r.customer_id)
FROM renting AS r
LEFT JOIN movies AS m
ON r.movie_id = m.movie_id
-- Only look at movie rentals in 2018
WHERE date_renting BETWEEN '2018-01-01'AND'2018-12-31' ;DataFrameas
df
variable
--Report the date of birth of the oldest and youngest US actor and actress.
SELECT a.gender, -- Report for male and female actors from the USA
MAX(year_of_birth), -- The year of birth of the oldest actor
MIN(year_of_birth) -- The year of birth of the youngest actor
FROM
(SELECT * -- Use a subsequen SELECT to get all information about actors from the USA
FROM actors
WHERE nationality = 'USA') AS a -- Give the table the name a
GROUP BY a.gender;DataFrameas
df
variable
--Which is the favorite movie on MovieNow? Answer this question for a specific group of customers: for all customers born in the 70s.
SELECT m.title,
COUNT(*),
AVG(r.rating)
FROM renting AS r
LEFT JOIN customers AS c
ON c.customer_id = r.customer_id
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
WHERE c.date_of_birth BETWEEN '1970-01-01' AND '1979-12-31'
GROUP BY m.title
HAVING COUNT(*)>'1' -- Remove movies with only one rental
ORDER BY AVG(r.rating); -- Order with highest rating firstDataFrameas
df
variable
--Your manager is interested in the total number of movie rentals, the average rating of all movies and the total revenue for each country since the beginning of 2019.
SELECT
c.country, -- For each country report
COUNT(*) AS number_renting, -- The number of movie rentals
AVG(rating) AS average_rating, -- The average rating
SUM(m.renting_price) AS revenue -- The revenue from movie rentals
FROM renting AS r
LEFT JOIN customers AS c
ON c.customer_id = r.customer_id
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
WHERE date_renting >= '2019-01-01'
GROUP BY c.country;DataFrameas
df
variable
--List all movies with more than 5 views using a nested query which is a powerful tool to implement selection conditions.
SELECT *
FROM movies
WHERE movie_id IN -- Select movie IDs from the inner query
(SELECT movie_id
FROM renting
GROUP BY movie_id
HAVING COUNT(*) > 5)DataFrameas
df
variable
--Report a list of customers who frequently rent movies on MovieNow.
SELECT *
FROM customers
WHERE customer_id IN -- Select all customers with more than 10 movie rentals
(SELECT customer_id
FROM renting
GROUP BY customer_id
HAVING COUNT(*)>10);DataFrameas
df
variable
--For the advertising campaign your manager also needs a list of popular movies with high ratings. Report a list of movies with rating above average.
SELECT title -- Report the movie titles of all movies with average rating higher than the total average
FROM movies
WHERE movie_id IN
(SELECT movie_id
FROM renting
GROUP BY movie_id
HAVING AVG(rating) >
(SELECT AVG(rating)
FROM renting));DataFrameas
df
variable
--A new advertising campaign is going to focus on customers who rented fewer than 5 movies. Use a correlated query to extract all customer information for the customers of interest.
SELECT *
FROM customers as c
WHERE 5 >
(SELECT count(*)
FROM renting as r
WHERE r.customer_id = c.customer_id);