LEGO - "Only the best is good enough"
The LEGO Group is a privately held company based in Billund, Denmark. The company is still owned by the Kirk Kristiansen family who founded it in 1932. The company has passed from father to son and is now owned by Kjeld Kirk Kristiansen, a grandchild of the founder.
The Lego Group's motto is "Only the best is good enough". It was created by Christiansen to encourage his employees never to skimp on quality, a value he believed in strongly. The name "LEGO" is an abbreviation of the two Danish words "leg godt", meaning "play well".
It has come a long way over the past almost 90 years - from a small carpenter’s workshop to a modern, global enterprise that is now one of the world’s largest manufacturers of toys. The products have undergone extensive development over the years – but the foundation remains the traditional LEGO brick.
Source: lego.com
Understanding Lego sets popularity
📖 Background
I work in local public government. My hobby is discovering new technology, programming, data analysis. I recently applied to work as a data analyst intern at the famous Lego Group in Denmark. As part of the job interview process, I received the following take-home assignment:
I am asked to use the provided dataset to understand the popularity of different Lego sets and themes. The idea is to become familiarized with the data to be ready for an interview with a business stakeholder.
💪 Challenge
This report summarizes my findings and answers the following questions:
- What is the average number of Lego sets released per year?
- What is the average number of Lego parts per year?
- Create a visualization for item 2.
- What are the 5 most popular colors used in Lego parts?
- What proportion of Lego parts are transparent?
- What are the 5 rarest lego bricks?
- Summarize your findings.
💾 The data
Database contains the following tables. (source):
inventory_parts
- "inventory_id" - id of the inventory the part is in (as in the inventories table)
- "part_num" - unique id for the part (as in the parts table)
- "color_id" - id of the color
- "quantity" - the number of copies of the part included in the set
- "is_spare" - whether or not it is a spare part
parts
- "part_num" - unique id for the part (as in the inventory_parts table)
- "name" - name of the part
- "part_cat_id" - part category id (as in part_catagories table)
part_categories
- "id" - part category id (as in parts table)
- "name" - name of the category the part belongs to
colors
- "id" - id of the color (as in inventory_parts table)
- "name" - color name
- "rgb" - rgb code of the color
- "is_trans" - whether or not the part is transparent/translucent
inventories
- "id" - id of the inventory the part is in (as in the inventory_sets and inventory_parts tables)
- "version" - version number
- "set_num" - set number (as in sets table)
inventory_sets
- "inventory_id" - id of the inventory the part is in (as in the inventories table)
- "set_num" - set number (as in sets table)
- "quantity" - the quantity of sets included
sets
- "set_num" - unique set id (as in inventory_sets and inventories tables)
- "name" - the name of the set
- "year" - the year the set was published
- "theme_id" - the id of the theme the set belongs to (as in themes table)
- num-parts - the number of parts in the set
themes
- "id" - the id of the theme (as in the sets table)
- "name" - the name of the theme
- "parent_id" - the id of the larger theme, if there is one
Acknowledgments: Rebrickable.com
Schema Diagram for LEGO tables
1. What is the average number of Lego sets released per year?
First we need to see a data. Let's see first 10 rows of sets table.
SELECT *
FROM sets
We can see, that there is a few record in a year. Let's see on chart the number of LEGO sets in each year. Dataframe is large, so we can see only subset visualisation.
Number of LEGO sets per year
The number of Lego sets released per year has a growing trend.
Next we'll find out the average number of Lego sets released per year. First we need to calculate the number LEGO sets per year and second calculate the mean. The result should be rounded to integer. There is only integer value of sets per year.
SELECT ROUND( AVG( set_number))
FROM
(
SELECT COUNT(*) set_number
FROM sets
GROUP BY year
) AS number_per_year
The answer to the first question is:
The average number of Lego sets released per year is 177.
2. What is the average number of Lego parts per year?
We can see, that every year has a few records. We can see num_parts column contains the number of parts in the set. We need to get average value of num_part in each year. So we need group data by the year and find out average of num_parts for rows within the group.
Numbers of part are natural numbers (positive integers), so we round the average.