Skip to content

Harry Potter Series Analysis

Since I'm a huge fan of Harry Potter series, I'm gonna be doing a small analysis on it using the two datasets already integrated in this workspace: movies and cinema to see what I can dig out from it using knowledge on Subqueries, Joins, Window Functions, CTEs, Aggregate Functions, CASE, etc. (๑˘︶˘๑)

What is the average rating of each movie part?

Let's first see if many others enjoy the series as much as I do.

Spinner
DataFrameas
avg_rating
variable
SELECT
    RANK() OVER (ORDER BY year_of_release) AS part,
    movies.year_of_release,
    movies.title,
    ROUND(AVG(renting.rating),2) AS avg_rating
    
FROM movies.movies
INNER JOIN movies.renting
    USING(movie_id)
    
WHERE title LIKE '%Harry Potter%'
GROUP BY movies.year_of_release, movies.title
ORDER BY avg_rating DESC;

The average rating for each movie are pretty high to me 👍 People seem to love the first and sixth part of the series. Let's see if they generated an equivalent gross revenue or not.

What is the profit each movie part generated?

Spinner
DataFrameas
profit
variable
SELECT
    RANK() OVER(ORDER BY release_year) AS part,
    title, gross, budget, 
    gross - budget AS profit
FROM cinema.films
WHERE title LIKE '%Harry Potter%'
ORDER BY gross DESC;

When excluding the two-part movie Deathly Hallows:

  • Looks like Sorcerer's Stone and Half-Blood Prince are really worth the hype as their revenue stay the highest and second-highest overall.

  • However, Half-Blood Prince did not seem to bring back much profit as it was not even close to half of what Sorcerer's Stone generated, which is probably due to its massive budget.

  • Chamber of Secrets and Prisoner of Azkaban despite having a pretty high score generated much lower revenue. It's a pity 'cause I absolutely love these two parts T^T. But still, the second movie was actually doing pretty great with its second place in profit recorded.

Which movie part is the most rented?

Now that we which movie can be regarded as 'most successful' out in cinema, let's see if their popularity remain in this renting store as well.

Spinner
DataFrameas
df
variable
SELECT
    m.title, 
    COUNT(*) AS times_rented
    
FROM movies.movies AS m
INNER JOIN movies.renting AS r
    USING(movie_id)
    
WHERE m.title LIKE '%Harry Potter%'
GROUP BY m.title
ORDER BY times_rented DESC;

Wow Half-Blood Prince must be really loved by this store's customers, too. But wait a minute.. is there any chance it was rented more for having a cheaper renting price than the others? Let's find out.

Spinner
DataFrameas
df
variable
-- Reuse the previous query to create a CTE
WITH num_rent AS (SELECT
                       m.title, 
                       COUNT(*) AS times_rented

                   FROM movies.movies AS m
                   INNER JOIN movies.renting AS r
                       USING(movie_id)
    
                   WHERE m.title LIKE '%Harry Potter%'
                   GROUP BY m.title
                   ORDER BY times_rented DESC)

SELECT 
    num_rent.title,
    num_rent.times_rented,
    m.renting_price
       
FROM num_rent
INNER JOIN movies.movies AS m
    ON num_rent.title = m.title

ORDER BY num_rent.times_rented DESC;

Our speculation has been confirmed! Half-Blood Prince is rented the most probably due to its cheapest renting price. But the majority of renters seemed to have enjoyed the movie resulting its second-highest rating score as shown at the beginning of the analysis.

Who are the actors and directors of the series?

Obviously aside from the famous trio, there are a lot of other characters played by different actors and actress. Also, to make the magical world realistic involves the help of directors, too. Let's see who they are!

Spinner
DataFrameas
df
variable
SELECT 
    people.name, 
    actor_director.role
    
FROM (SELECT 
          DISTINCT(r.person_id), 
          r.role
      FROM cinema.roles AS r
      WHERE r.film_id IN (SELECT 
                              f.id
                          FROM cinema.films AS f
                          WHERE f.title LIKE '%Harry Potter%')) AS actor_director
INNER JOIN cinema.people
ON actor_director.person_id = people.id

ORDER BY role;