Skip to content

Data-Driven Decision Making in SQL

Here you can access every table used in the course. To access each table, you will need to specify the movies schema in your queries (e.g., movies.movies for the movies table, and movies.customers for the customers table).

Take Notes

Add notes about the concepts you've learned and SQL cells with queries you want to keep.

Data-driven decision making acheives short-term and long term goals

  • we can extract valueable information from data to support operational short term decisions. ie. popularity of certain actors to decide whether to purchase certain movies.
  • long-term decisions, data-driven support can provide information regarding customer growth and success in certain regions in the past which can drive decisions about when and where market can be expanded in the future.
  • KPIs: Key Performance Indicators
    • help the company (or its subdivision) define and monitor success.
      • Success KPI may be revenue
      • for subdivision customor relations management, the KPI could be quantified by average rating of all movies
      • KPI customer engagement = the number of active customers in a certain perios of time

Add your notes here

Spinner
DataFrameas
actors
variable
-- You are asked to provide a report about the development of the company. Specifically, your manager is interested in the total number of movie rentals, the total number of ratings and the average rating of all movies since the beginning of 2019.

SELECT 
	COUNT(*) AS number_renting,
	AVG(rating) AS average_rating, 
    COUNT(rating) AS number_ratings -- Add the total number of ratings here.
FROM movies.renting
WHERE date_renting >= '2019-01-01';
Spinner
DataFrameas
df
variable
-- Conduct an analysis to see when the first customer accounts were created for each country.
SELECT country, 
	MIN(date_account_start) AS first_account
FROM customers
GROUP BY country
ORDER BY first_account;
Spinner
DataFrameas
df
variable
--For each movie the average rating, the number of ratings and the number of views has to be reported. Generate a table with meaningful column names.
SELECT movie_id, 
       AVG(rating) AS avg_rating,
       COUNT(rating) AS number_ratings,
       COUNT(*) AS number_renting
FROM renting
GROUP BY movie_id
ORDER BY avg_rating desc; -- Order by average rating in decreasing order
Spinner
DataFrameas
df
variable
--The management of MovieNow wants to report key performance indicators (KPIs) for the performance of the company in 2018. They are interested in measuring the financial successes as well as user engagement. Important KPIs are, therefore, the profit coming from movie rentals, the number of movie rentals and the number of active customers.

SELECT 
	SUM(m.renting_price), 
	COUNT(*), 
	COUNT(DISTINCT r.customer_id)
FROM renting AS r
LEFT JOIN movies AS m
ON r.movie_id = m.movie_id
-- Only look at movie rentals in 2018
WHERE date_renting BETWEEN '2018-01-01'AND'2018-12-31' ;
Spinner
DataFrameas
df
variable
--Report the date of birth of the oldest and youngest US actor and actress.

SELECT a.gender, -- Report for male and female actors from the USA 
       MAX(year_of_birth), -- The year of birth of the oldest actor
       MIN(year_of_birth) -- The year of birth of the youngest actor
FROM
   (SELECT * -- Use a subsequen SELECT to get all information about actors from the USA
   FROM actors
   WHERE nationality = 'USA') AS a -- Give the table the name a
GROUP BY a.gender;
Spinner
DataFrameas
df
variable
--Which is the favorite movie on MovieNow? Answer this question for a specific group of customers: for all customers born in the 70s.

SELECT m.title, 
COUNT(*),
AVG(r.rating)
FROM renting AS r
LEFT JOIN customers AS c
ON c.customer_id = r.customer_id
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
WHERE c.date_of_birth BETWEEN '1970-01-01' AND '1979-12-31'
GROUP BY m.title
HAVING COUNT(*)>'1' -- Remove movies with only one rental
ORDER BY AVG(r.rating); -- Order with highest rating first
Spinner
DataFrameas
df
variable
--Your manager is interested in the total number of movie rentals, the average rating of all movies and the total revenue for each country since the beginning of 2019.
SELECT 
	c.country,                    -- For each country report
	COUNT(*) AS number_renting, -- The number of movie rentals
	AVG(rating) AS average_rating, -- The average rating
	SUM(m.renting_price) AS revenue         -- The revenue from movie rentals
FROM renting AS r
LEFT JOIN customers AS c
ON c.customer_id = r.customer_id
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
WHERE date_renting >= '2019-01-01'
GROUP BY c.country;
Spinner
DataFrameas
df
variable
--List all movies with more than 5 views using a nested query which is a powerful tool to implement selection conditions.

SELECT *
FROM movies
WHERE movie_id IN  -- Select movie IDs from the inner query
	(SELECT movie_id
	FROM renting
	GROUP BY movie_id
	HAVING COUNT(*) > 5)
Spinner
DataFrameas
df
variable
--Report a list of customers who frequently rent movies on MovieNow.
SELECT *
FROM customers
WHERE customer_id IN            -- Select all customers with more than 10 movie rentals
	(SELECT customer_id
	FROM renting
	GROUP BY customer_id
	HAVING COUNT(*)>10);
Spinner
DataFrameas
df
variable
--For the advertising campaign your manager also needs a list of popular movies with high ratings. Report a list of movies with rating above average.
SELECT title -- Report the movie titles of all movies with average rating higher than the total average
FROM movies
WHERE movie_id IN
	(SELECT movie_id
	 FROM renting
     GROUP BY movie_id
     HAVING AVG(rating) > 
		(SELECT AVG(rating)
		 FROM renting));
Spinner
DataFrameas
df
variable
--A new advertising campaign is going to focus on customers who rented fewer than 5 movies. Use a correlated query to extract all customer information for the customers of interest.

SELECT *
FROM customers as c
WHERE 5 > 
	(SELECT count(*)
	FROM renting as r
	WHERE r.customer_id = c.customer_id);