Skip to content
Competition - Lego sets- first try of SQL
Understanding Lego sets popularity
Now let's now move on to the competition and challenge.
📖 Background
You recently applied to work as a data analyst intern at the famous Lego Group in Denmark. As part of the job interview process, you received the following take-home assignment:
You are asked to use the provided dataset to understand the popularity of different Lego sets and themes. The idea is to become familiarized with the data to be ready for an interview with a business stakeholder.
💾 The data
You received access to a database with the following tables. You can also see above a visualization of how the tables are related to each other. (source):
inventory_parts
- "inventory_id" - id of the inventory the part is in (as in the inventories table)
- "part_num" - unique id for the part (as in the parts table)
- "color_id" - id of the color
- "quantity" - the number of copies of the part included in the set
- "is_spare" - whether or not it is a spare part
parts
- "part_num" - unique id for the part (as in the inventory_parts table)
- "name" - name of the part
- "part_cat_id" - part category id (as in part_catagories table)
part_categories
- "id" - part category id (as in parts table)
- "name" - name of the category the part belongs to
colors
- "id" - id of the color (as in inventory_parts table)
- "name" - color name
- "rgb" - rgb code of the color
- "is_trans" - whether or not the part is transparent/translucent
inventories
- "id" - id of the inventory the part is in (as in the inventory_sets and inventory_parts tables)
- "version" - version number
- "set_num" - set number (as in sets table)
inventory_sets
- "inventory_id" - id of the inventory the part is in (as in the inventories table)
- "set_num" - set number (as in sets table)
- "quantity" - the quantity of sets included
sets
- "set_num" - unique set id (as in inventory_sets and inventories tables)
- "name" - the name of the set
- "year" - the year the set was published
- "theme_id" - the id of the theme the set belongs to (as in themes table)
- num-parts - the number of parts in the set
themes
- "id" - the id of the theme (as in the sets table)
- "name" - the name of the theme
- "parent_id" - the id of the larger theme, if there is one
Acknowledgments: Rebrickable.com
💪 Challenge
Create a report to summarize your findings. Include:
- What is the average number of Lego sets released per year?
- What is the average number of Lego parts per year?
- Create a visualization for item 2.
- What are the 5 most popular colors used in Lego parts?
- [Optional] What proportion of Lego parts are transparent?
- [Optional] What are the 5 rarest lego bricks?
- Summarize your findings.
💡 Learn more
The following DataCamp courses can help review the skills needed for this challenge:
✅ Checklist before publishing
- Rename your workspace to make it descriptive of your work. N.B. you should leave the notebook name as notebook.ipynb.
- Remove redundant cells like the introduction to data science notebooks, so the workbook is focused on your story.
- Check that all the cells run without error.
DataFrameas
df
variable
SELECT set_num
FROM sets
WHERE year=1960
DataFrameas
df
variable
SELECT year, COUNT(DISTINCT set_num)
FROM sets
GROUP BY year
ORDER BY year;
DataFrameas
df
variable
SELECT COUNT(DISTINCT set_num)/COUNT(DISTINCT year) AS avg_number_of_sets_per_year
FROM sets
The average number of produced sets by Lego from 1950 to 2019 is 176. **
The average number of parts used in Lego sets by year of production is shown on the chart below
DataFrameas
df
variable
SELECT year, AVG(num_parts) AS avg_parts_number
FROM sets
GROUP BY year
ORDER BY year;
The average using of parts in produced sets is vary during the time. But we can observe a litle growing trend. Lego has launched smaller sets between years 1997 and 2004. T The high average number of parts in 1960 is very conspicuous, especially when compared to the number of kits produced that year. This year, Lego released a large creative set consisting of 475 pieces, and only 2 small sets.is very conspicuous, especially when compared to the number of sets produced that year.Apart from 1987, when 209 Lego sets were released, until 1997 Lego did not release more than 200 sets per year. Therefore, when comparing the average number of parts in a set, you should also take into account the number of these.
DataFrameas
df
variable
SELECT color_id, COUNT(color_id) AS count_colors
FROM inventory_parts
GROUP BY color_id
ORDER BY count_colors DESC
LIMIT 5;
DataFrameas
df
variable
SELECT id, name
FROM colors
WHERE id =0
‌
‌
‌
‌
‌