Skip to content
Competition - Lego sets
  • AI Chat
  • Code
  • Report
  • Exploring Lego Set Popularity

    Background

    Legos! founded in Denmark in 1932, has become a household name and a cherished toy all around the world. The brightly coloured bricks and figurines allow children to express their creativity and imagination as they build everything from small houses to fortresses. Lego continues to inspire and thrill generations of children (and adults), and it is one of the most iconic and cherished toys of all time.

    This project focuses on exploring some data to provide some insights regarding Lego.

    Research Questions

    Using PostgreSQL and Jupyter Notebook, with Tableau for visualisation, the following questions will be answered:

    1. What is the average number of Lego sets released per year over the total time span of our data?
    2. What is the average number of Lego parts over total time span and per year?
    3. What are the 5 most popular colors used in Lego parts?
    4. What proportion of Lego parts are transparent?
    5. What are the 5 rarest categories of Lego bricks?
    6. A summary of the findings.

    💾Database Description

    inventory_parts
    • "inventory_id" - id of the inventory the part is in (as in the inventories table)
    • "part_num" - unique id for the part (as in the parts table)
    • "color_id" - id of the color
    • "quantity" - the number of copies of the part included in the set
    • "is_spare" - whether or not it is a spare part
    parts
    • "part_num" - unique id for the part (as in the inventory_parts table)
    • "name" - name of the part
    • "part_cat_id" - part category id (as in part_catagories table)
    part_categories
    • "id" - part category id (as in parts table)
    • "name" - name of the category the part belongs to
    colors
    • "id" - id of the color (as in inventory_parts table)
    • "name" - color name
    • "rgb" - rgb code of the color
    • "is_trans" - whether or not the part is transparent/translucent
    inventories
    • "id" - id of the inventory the part is in (as in the inventory_sets and inventory_parts tables)
    • "version" - version number
    • "set_num" - set number (as in sets table)
    inventory_sets
    • "inventory_id" - id of the inventory the part is in (as in the inventories table)
    • "set_num" - set number (as in sets table)
    • "quantity" - the quantity of sets included
    sets
    • "set_num" - unique set id (as in inventory_sets and inventories tables)
    • "name" - the name of the set
    • "year" - the year the set was published
    • "theme_id" - the id of the theme the set belongs to (as in themes table)
    • num-parts - the number of parts in the set
    themes
    • "id" - the id of the theme (as in the sets table)
    • "name" - the name of the theme
    • "parent_id" - the id of the larger theme, if there is one

    The data was provided by Rebrickable.com and be accessed through https://rebrickable.com/downloads/

    DataBase Schema

    The relationships between the tables in the Lego database are shown in the schema below.

    Analysis

    1. What is the average number of Lego sets released per year over the total time span of our data?

    To calculate the average number of Lego sets released per year, we'll follow these steps :

    1. Determine the total amount of sets (aliased here as total_sets)
    2. Determine the total amount of years (aliased here as total_years)
    3. Calculate the average by dividing the total amount of sets by the total amount of years.
    Unknown integration
    DataFrameavailable as
    df1
    variable
    SELECT COUNT(*) AS total_sets, MAX(year) - MIN(year) + 1 AS total_years, COUNT(DISTINCT set_num) / (MAX(year) - MIN(year) + 1) AS avg_sets_per_year
    FROM sets;
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

    2. What is the average number of Lego parts over total time span and per year?

    2.1 What is the average number of Lego parts over the total time span of our data?

    We can calculate the average number of Lego parts as follows :

    1. Determine the total amount of parts (aliased here as total_parts),
    2. Determine the total amount of years (aliased here as total_years),
    3. Calculate the average by dividing the total amount of parts by the total amount of years.
    Unknown integration
    DataFrameavailable as
    df
    variable
    SELECT tp.total_parts/ty.total_years as avg_num_lego_parts,  tp.total_parts, ty.total_years
    FROM 
    	(SELECT SUM(num_parts) AS total_parts
    	 FROM sets) AS tp,  
    	 (SELECT COUNT(DISTINCT(year)) As total_years
    	 FROM sets)  AS ty
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

    2.2 What is the average number of Lego parts per year?

    We can calculate this average number of Lego parts as follows :

    1. Determine the average amount of parts
    2. Round to a full number
    3. Group per year and order per year to see the evolution throughout the years.
    Unknown integration
    DataFrameavailable as
    df
    variable
    SELECT ROUND(AVG(num_parts)) As Average_number_of_parts, year AS production_year
    FROM sets 
    GROUP BY year
    ORDER BY year 
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

    3. What are the 5 most popular colors used in Lego parts?

    To calculate the 5 most popular colors in Lego, we must join the 'inventory_parts' table with the 'colors' table.

    Unknown integration
    DataFrameavailable as
    df
    variable
    SELECT colors.name AS color, COUNT(*)
    		--join inventory parts with the colors table
    FROM inventory_parts
    	INNER JOIN colors
        	ON inventory_parts.color_id = colors.id
    		--display total counts by color name
    GROUP BY colors.name
    		--order by descending count and limit results to top 5
    ORDER BY COUNT(*) DESC
    LIMIT 5
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

    4. What proportion of lego parts are transparent?

    To find this, I will create a common table expression. CTE.

    Unknown integration
    DataFrameavailable as
    df
    variable
    -- CTE for transparent parts. YOU JUST COPIED THIS. STUDY PLEASE.
    WITH transparent_parts AS (
        SELECT COUNT(DISTINCT ip.part_num)::float AS transparent_parts
        FROM inventory_parts  AS ip
        JOIN colors AS c 
    	ON ip.color_id = c.id
        WHERE c.is_trans = true),
    
    -- CTE for all the parts
    all_parts AS (
        SELECT COUNT(DISTINCT part_num) AS all_parts
        FROM inventory_parts)
    
    -- Calculate proportion
    	SELECT (transparent_parts/all_parts) * 100 AS proportion_transparent_parts
    	FROM transparent_parts, all_parts
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

    A small proportion of Lego bricks is transparant, more specifically about 6.3%.