Skip to content
0

Understanding Lego sets popularity

📖 Background

You recently applied to work as a data analyst intern at the famous Lego Group in Denmark. As part of the job interview process, you received the following take-home assignment:

You are asked to use the provided dataset to understand the popularity of different Lego sets and themes. The idea is to become familiarized with the data to be ready for an interview with a business stakeholder.

Summary:

Set Production: The average yearly release of Lego sets is 176.86, with a gradual increase over time despite occasional fluctuations.

Part Production: Each year sees an average of 28,698 Lego parts, with sets incorporating more parts over time, reflecting increased complexity.

Color Diversity: White, Black, Yellow, Red, and Blue are the top colors by unique part count, indicating a varied palette across Lego products. The proportion of transparent parts remains low, comprising only 6% of all unique parts.

Rarest Bricks: Some of the rarest bricks include Brick 10 x 10 without Bottom Tubes and Brick 2 x 2 without Bottom Tubes.

💾 The data

You received access to a database with the following tables. You can also see above a visualization of how the tables are related to each other. (source):
inventory_parts
  • "inventory_id" - id of the inventory the part is in (as in the inventories table)
  • "part_num" - unique id for the part (as in the parts table)
  • "color_id" - id of the color
  • "quantity" - the number of copies of the part included in the set
  • "is_spare" - whether or not it is a spare part
parts
  • "part_num" - unique id for the part (as in the inventory_parts table)
  • "name" - name of the part
  • "part_cat_id" - part category id (as in part_catagories table)
part_categories
  • "id" - part category id (as in parts table)
  • "name" - name of the category the part belongs to
colors
  • "id" - id of the color (as in inventory_parts table)
  • "name" - color name
  • "rgb" - rgb code of the color
  • "is_trans" - whether or not the part is transparent/translucent
inventories
  • "id" - id of the inventory the part is in (as in the inventory_sets and inventory_parts tables)
  • "version" - version number
  • "set_num" - set number (as in sets table)
inventory_sets
  • "inventory_id" - id of the inventory the part is in (as in the inventories table)
  • "set_num" - set number (as in sets table)
  • "quantity" - the quantity of sets included
sets
  • "set_num" - unique set id (as in inventory_sets and inventories tables)
  • "name" - the name of the set
  • "year" - the year the set was published
  • "theme_id" - the id of the theme the set belongs to (as in themes table)
  • num-parts - the number of parts in the set
themes
  • "id" - the id of the theme (as in the sets table)
  • "name" - the name of the theme
  • "parent_id" - the id of the larger theme, if there is one

Acknowledgments: Rebrickable.com

⌛️ Time is ticking. Good luck!

Lego inspires kids and develops their creative and systematic reasoning skills.

The purpose of this report to understand the popularity of different Lego sets and themes.

In this report you can find answers to the following questions:

  • What is the average number of Lego sets released per year?
  • What is the average number of Lego parts per year?
  • What are the 5 most popular colors used in Lego parts?
  • What proportion of Lego parts are transparent?
  • What are the 5 rarest lego bricks?
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
Spinner
DataFrameas
df
variable
SELECT *
FROM inventory_parts
LIMIT 10;
Spinner
DataFrameas
df14
variable
SELECT *
FROM parts
WHERE name ILIKE '%brick%'
LIMIT 10;
Spinner
DataFrameas
df15
variable
SELECT *
FROM part_categories
WHERE name ILIKE '%brick%'
LIMIT 10;
Spinner
DataFrameas
df7
variable
SELECT *
FROM public.colors
Spinner
DataFrameas
df2
variable
SELECT *
FROM sets
LIMIT 10;

Average Annual Lego Set Releases Analysis

The average number of Lego sets released per year is 176.86

Spinner
DataFrameas
df3
variable
-- Calculate the average number of distinct Lego sets released per year (rounded to two decimal places).
-- The subquery groups the data in the "sets" table by the "year" column.
-- For each group (representing a specific year), it calculates the count of distinct sets (identified by "set_num")to avoid the duplicates.
-- The result is given the alias "total_sets" in this subquery.

SELECT ROUND(AVG(total_sets), 0) AS avg_sets_per_year
FROM (
    SELECT year, COUNT(DISTINCT set_num) AS total_sets
    FROM sets
    GROUP BY year
) AS yearly_sets;