Skip to content
LEGOS: Success Over the Years
LEGOS: Success Over the Years
"Only the Best is Good Enough"
The goal of this project is to explore the lego data from 1970 to 2022 to identify the evolution of Lego as a company creating products for young children to adults.
Questions to guide this exploration:
- What are the largest Themes/theme groups?
- What are the most expensive/least expensive sets and do those sets get more expensive over time?
- Average prices and peices by year, theme groups
First, I want to explore the data and identify the different themeGroups and themes.
DataFrameas
df
variable
--Explore data and columns
Select *
From lego_sets.csv as lego
DataFrameas
df1
variable
-- Count by ThemeGroup
Select themeGroup, Count(themeGroup) as total
From lego_sets.csv
Group by themeGroup
Order by Count(themeGroup) DESC
DataFrameas
df7
variable
--Count by Theme
Select theme, Count(theme) as total
From lego_sets.csv
Group by theme
Order by Count(theme) DESC
The Miscellaneous themeGroup was the largest so I drilled down to see what themes were considered miscellaneous.
DataFrameas
df3
variable
--MISCELLANEOUS Theme group drill down
SELECT name, theme, subtheme
FROM lego_sets.csv
WHERE themeGroup = 'Miscellaneous'
GROUP BY name, theme, subtheme;
Then I wanted to look into the average prices, average number of pieces and average minimum age range. This I can use to compare and identfy the least or most expensive sets Lego makes.
DataFrameas
df5
variable
[3]
-- Find average # of pieces for sets and average price
SELECT Round(AVG(pieces)) as avg_num_pieces, ROUND(AVG(US_retailPrice), 2) as avg_price, ROUND(AVG(agerange_min), 1) as avg_age_min
FROM lego_sets.csv
DataFrameas
df2
variable
-- Top 10 Most expensive sets and year
Select name, year, pieces, max(US_retailPrice) as max_price
From lego_sets.csv
WHERE US_retailPrice is not null
Group by name, year, pieces
Order by max(US_retailPrice) DESC
Limit 10
DataFrameas
df4
variable
-- Top 10 Least expensive sets and year
Select name, year, pieces, min (US_retailPrice) as min_price
From lego_sets.csv
WHERE US_retailPrice is not null
Group by name, year, pieces
Order by min(US_retailPrice) ASC
Limit 10
DataFrameas
df6
variable
---Find average pieces and price group by year
SELECT year, Round(AVG(pieces)) as avg_num_pieces, ROUND(AVG(US_retailPrice), 2) as avg_price
FROM lego_sets.csv
Group by year
Order by ROUND(AVG(US_retailPrice), 2) DESC
DataFrameas
df8
variable
---AVG price per different theme group (what sells)
SELECT themeGroup, ROUND(AVG(US_retailPrice), 2) as avg_price
From lego_sets.csv
Group by themeGroup
Order by ROUND(AVG(US_retailPrice), 2) DESC
Exploring Licensed Themes
Questions:
- What percentage of lego sets are Licensed themes?
- What are the top licensed themes and how many sets per theme?
- Did Lego increase their production of licensed themes over the years?
DataFrameas
df11
variable
SELECT Round((2509/Count(themeGroup) * 100), 2) as percent_licensed
FROM lego_sets.csv