Skip to content

Lego biggest sets query

Lego blocks

Since the lego company was founded on the year 1932, it has remained a key player in the toy industry and market, and a lot of diverse themes and sets have been published and purchased by avid fans. Lego sets can generally be classified into several specific topics, ranging from Star Wars themed sets to technical sets that include generic spare parts.

To analise the history of this company, the Lego database provided by datacamp has been used in this analysis to obtain various key indicators stored in the data, and specifically;

  1. The years and names of the oldest and newest lego themes
  2. The total number of lego themes and sets
  3. The top 5 themes with greater number of sets
  4. The top 10 sets published between the years 2000 and 2015 (inclusive) with more number of parts, excluding those within the all-time top 5 lego themes with more number of sets

1. Years and names of the oldest and newest lego themes

Year of oldest and newest sets

Spinner
DataFrameas
years
variable
SELECT
	MIN(year) AS min_year_available,
	MAX(year) AS max_year_available
FROM sets;

Name and year of the oldest and newest themes

Spinner
DataFrameas
df1
variable
SELECT 
	DISTINCT t.name AS theme,
	s.year
	
FROM
	themes as t
	INNER JOIN
	sets as s
	ON t.id = s.theme_id
	
WHERE
	s.year = (SELECT MIN(year) FROM sets) OR
	s.year = (SELECT MAX(year) FROM sets)
	
ORDER BY
	s.year,
	theme;

2. Total number of lego themes and sets

Number of themes

Spinner
DataFrameas
num_themes
variable
SELECT COUNT(DISTINCT(name)) AS num_themes
FROM themes;

Number of sets

Spinner
DataFrameas
num_sets
variable
SELECT COUNT(DISTINCT(name)) AS num_sets
FROM sets;

3. Top 5 lego themes with the greater number of sets

Spinner
DataFrameas
top_themes
variable
SELECT 
	t.name AS theme,
	COUNT(s.name) AS num_sets

FROM
	themes AS t
	INNER JOIN
	sets AS s
	ON t.id = s.theme_id

GROUP BY t.name
ORDER BY num_sets DESC
LIMIT 5;

4. Top 10 sets published between the years 2000 and 2015 with more parts, excluding the all-time top 5 themes

Spinner
DataFrameas
top_sets
variable
-- Set top_themes
WITH top_themes AS (
	SELECT
		t.name AS theme,
		COUNT(s.name) as num_sets
	
	FROM
		themes as t
		INNER JOIN
		sets as s
		ON t.id = s.theme_id
	
	GROUP BY t.name
	ORDER BY num_sets DESC
	LIMIT 5)

-- Get name, year, number of parts and theme of the top 10 sets published between the years 2000 and
-- 2015 (inclusive) with more number of parts and excluding the all-time top 5 most popular lego themes
SELECT 
	s.name AS set_name,
	s.year AS set_year,
	s.num_parts AS set_num_parts,
	t.name as theme
	
FROM
	sets AS s
	INNER JOIN
	themes AS t
	ON s.theme_id = t.id
	
WHERE 
	s.year BETWEEN 2000 AND 2015 AND
	t.name NOT IN (SELECT theme FROM top_themes)
	
ORDER BY set_num_parts DESC
LIMIT 10;