Skip to content
0

Understanding LEGO sets popularity

Background

The goals of this project is to become familiarized with the popularity of different LEGO sets and themes. Specifically:

  1. How many sets and parts are made per year?
  2. What are the most popular colors?
  3. What proportion of parts are transparent?
  4. What are the rarest LEGO parts?
  5. What are the most popular themes?
  6. What are the biggest LEGO sets? The database used for this analysis came from Rebrickable and contains data up to 2017.

Sets Released per Year

Spinner
DataFrameas
by_year
variable
SELECT year AS "Year", COUNT(set_num) AS "Number of Sets"
FROM sets
GROUP BY year
ORDER BY year;

There is a general trend of more sets per year as time goes on, with some fluctuations. The number of sets for 2017 is only half of that for 2016, but since this is the last year in the set, it may not be representative of the entire year, excluding some sets.

Parts Created per Year

Spinner
DataFrameas
df1
variable
SELECT sets.year AS "Year", COUNT(part_num) AS "Number Unique Parts"
FROM sets
INNER JOIN inventories ON sets.set_num = inventories.set_num
INNER JOIN inventory_parts ON inventories.id = inventory_parts.inventory_id
GROUP BY sets.year

The number of unique parts per year has been steadily increasing over time, with much less fluctuation than the number of sets. This indicates that even though fewer sets might be being made, those sets contain more unique parts. The number of parts that can be made is perhaps the limiting factor in how many different sets can be made, even as production allows for more parts to be made.

In 2002, 447 sets were made, but this decreased to 283 by 2006. But at the same time, unique part production increased from 13,741 to 18,091.

Spinner
DataFrameas
df2
variable
SELECT sets.name AS "Set name", sets.year AS "Year", COUNT(part_num) AS "Unique parts"
FROM sets 
INNER JOIN inventories ON sets.set_num = inventories.set_num
INNER JOIN inventory_parts ON inventories.id = inventory_parts.inventory_id
WHERE sets.year IN (2002, 2006)
GROUP BY sets.name, sets.year
ORDER BY COUNT(part_num) DESC
LIMIT 10;

The set with the largest number of unique parts in 2002 had 237, but 2006 had several sets with more unique parts, topping out at 310. This shows a shift from producing many sets with fewer parts, to producing fewer sets with more unique parts.

LEGO Part Colors

Spinner
DataFrameas
df3
variable
SELECT colors.name AS "Color", COUNT(inventory_parts.quantity) AS "Number of Parts"
FROM inventory_parts 
INNER JOIN colors ON inventory_parts.color_id = colors.id
GROUP BY colors.name
ORDER BY "Number of Parts" DESC
LIMIT 5; 

Black, white, and gray pieces are the most common colors, which is unsurprising considering they are probably the most versatile pieces. Red pieces are also very common.

Percent Transparent Pieces

Spinner
DataFrameas
df6
variable
WITH parts_trans AS (
	SELECT COUNT(DISTINCT(part_num))::float AS parts_trans
	FROM inventory_parts 
	INNER JOIN colors ON inventory_parts.color_id = colors.id
	WHERE colors.is_trans = TRUE),

parts_all AS(
	SELECT COUNT(DISTINCT(part_num))::float AS parts_all
	FROM inventory_parts)
	
SELECT (parts_trans / parts_all) * 100 as "Percent Transparent"
FROM parts_trans, parts_all;

Transparent LEGO pieces are uncommon, only making up 6.3% of all parts produced.