Skip to content
0

Understanding Lego sets popularity

Now let's now move on to the competition and challenge.

📖 Background

You recently applied to work as a data analyst intern at the famous Lego Group in Denmark. As part of the job interview process, you received the following take-home assignment:

You are asked to use the provided dataset to understand the popularity of different Lego sets and themes. The idea is to become familiarized with the data to be ready for an interview with a business stakeholder.

💾 The data

You received access to a database with the following tables. You can also see above a visualization of how the tables are related to each other. (source):
inventory_parts
  • "inventory_id" - id of the inventory the part is in (as in the inventories table)
  • "part_num" - unique id for the part (as in the parts table)
  • "color_id" - id of the color
  • "quantity" - the number of copies of the part included in the set
  • "is_spare" - whether or not it is a spare part
parts
  • "part_num" - unique id for the part (as in the inventory_parts table)
  • "name" - name of the part
  • "part_cat_id" - part category id (as in part_catagories table)
part_categories
  • "id" - part category id (as in parts table)
  • "name" - name of the category the part belongs to
colors
  • "id" - id of the color (as in inventory_parts table)
  • "name" - color name
  • "rgb" - rgb code of the color
  • "is_trans" - whether or not the part is transparent/translucent
inventories
  • "id" - id of the inventory the part is in (as in the inventory_sets and inventory_parts tables)
  • "version" - version number
  • "set_num" - set number (as in sets table)
inventory_sets
  • "inventory_id" - id of the inventory the part is in (as in the inventories table)
  • "set_num" - set number (as in sets table)
  • "quantity" - the quantity of sets included
sets
  • "set_num" - unique set id (as in inventory_sets and inventories tables)
  • "name" - the name of the set
  • "year" - the year the set was published
  • "theme_id" - the id of the theme the set belongs to (as in themes table)
  • num-parts - the number of parts in the set
themes
  • "id" - the id of the theme (as in the sets table)
  • "name" - the name of the theme
  • "parent_id" - the id of the larger theme, if there is one

Acknowledgments: Rebrickable.com

💪 Challenge

Create a report to summarize your findings. Include:

  1. What is the average number of Lego sets released per year?
  2. What is the average number of Lego parts per year?
  3. Create a visualization for item 2.
  4. What are the 5 most popular colors used in Lego parts?
  5. [Optional] What proportion of Lego parts are transparent?
  6. [Optional] What are the 5 rarest lego bricks?
  7. Summarize your findings.

**1. What is the average number of Lego sets released per year?

On average, 176 sets are realesed per year.

Spinner
DataFrameas
setsyear
variable
--- Count the average value of the sets produced from the subquery. 

SELECT TRUNC(AVG(sets_ayear)) as Avg_sets,
       COUNT (year) as Tot_year, 
       SUM(sets_ayear) as Tot_sets 

--- Count the amount of sets produced per year, alising it into a subquery
FROM
      (SELECT COUNT(*) AS sets_ayear, year 
       FROM sets
       GROUP by year) as sq_sets_per_year

**2.What is the average number of Lego parts per year?

On average, 28698 parts are realesed per year. The dynamic of the parts produced can be seen below the code box.

Spinner
DataFrameas
partsyear
variable
--- Count the average value of the parts produced from the subquery. 

SELECT TRUNC(AVG(parts_ayear)) as Avg_parts, 
       COUNT (year) as Tot_year, 
       SUM(parts_ayear) as Tot_parts 

--- Count the amount of parts produced per year, alising it into a subquery
FROM
      (SELECT SUM(num_parts) AS parts_ayear, year 
       FROM sets
       GROUP by year) as sq_parts_per_year
Spinner
DataFrameas
partsdetail
variable
SELECT year,
       TRUNC(AVG(num_parts)) AS parts_per_year
       
FROM sets

GROUP BY year
ORDER BY year ASC

4. What are the 5 most popular colors used in Lego parts?

Black, White, Light Bluish Gray, Red and Dark Bluish Gray.

Hidden code df

5. [Optional] What proportion of Lego parts are transparent?* 6,2% of them are transparent

Spinner
DataFrameas
df
variable
WITH transparent_parts AS (SELECT COUNT(DISTINCT ip.part_num) AS num_parts_distinct
                            FROM inventory_parts AS ip
                            LEFT JOIN colors AS c
                            ON ip.color_id=c.id
                            WHERE c.is_trans='True'
                            GROUP BY c.is_trans),
     all_parts AS (SELECT COUNT(DISTINCT part_num)::float AS all_parts
                  FROM inventory_parts)
     
SELECT (transparent_parts.num_parts_distinct/all_parts.all_parts) AS percentage
       
FROM all_parts, transparent_parts
Hidden output

6. What are the 5 rarest lego bricks?

Spinner
DataFrameas
df
variable
SELECT pc.name As name,
       SUM(ip.quantity)as tot_quantity
       
FROM part_categories AS pc
     JOIN parts as p
     ON pc.id=p.part_cat_id
     JOIN inventory_parts AS ip
     ON p.part_num = ip.part_num
    
WHERE pc.name LIKE '%rick%'
GROUP BY pc.name
ORDER BY tot_quantity ASC
LIMIT 5