Skip to content
Analyzing Lego data and identifying "surprises" - Strashenko Anna
0
  • AI Chat
  • Code
  • Report
  • Understanding Lego sets popularity

    Information about the data

    💾 The data

    inventory_parts
    • "inventory_id" - id of the inventory the part is in (as in the inventories table)
    • "part_num" - unique id for the part (as in the parts table)
    • "color_id" - id of the color
    • "quantity" - the number of copies of the part included in the set
    • "is_spare" - whether or not it is a spare part
    parts
    • "part_num" - unique id for the part (as in the inventory_parts table)
    • "name" - name of the part
    • "part_cat_id" - part category id (as in part_catagories table)
    part_categories
    • "id" - part category id (as in parts table)
    • "name" - name of the category the part belongs to
    colors
    • "id" - id of the color (as in inventory_parts table)
    • "name" - color name
    • "rgb" - rgb code of the color
    • "is_trans" - whether or not the part is transparent/translucent
    inventories
    • "id" - id of the inventory the part is in (as in the inventory_sets and inventory_parts tables)
    • "version" - version number
    • "set_num" - set number (as in sets table)
    inventory_sets
    • "inventory_id" - id of the inventory the part is in (as in the inventories table)
    • "set_num" - set number (as in sets table)
    • "quantity" - the quantity of sets included
    sets
    • "set_num" - unique set id (as in inventory_sets and inventories tables)
    • "name" - the name of the set
    • "year" - the year the set was published
    • "theme_id" - the id of the theme the set belongs to (as in themes table)
    • num-parts - the number of parts in the set
    themes
    • "id" - the id of the theme (as in the sets table)
    • "name" - the name of the theme
    • "parent_id" - the id of the larger theme, if there is one

    Acknowledgments: Rebrickable.com

    Importing libraries

    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sns

    1. What is the average number of Lego sets released per year?

    Studying the data of the sets table and counting the number of sets produced per year

    Unknown integration
    DataFrameavailable as
    df
    variable
    SELECT COUNT(set_num) as count_set_year, year
    FROM sets
    GROUP BY year
    ORDER BY year ASC

    Visualization of data on the number of sets produced for each year

    Current Type: Bar
    Current X-axis: year
    Current Y-axis: count_set_year
    Current Color: None

    To get the current value of the average number of sets produced, data from 1998 to the latest available data will be taken for calculation

    print('Average number of Lego sets produced per year (current calculation):', df[df['year'] >= 1998]['count_set_year']
          .agg(np.mean).round(0))

    Average number of Lego sets produced per year for all years

    print('Average number of Lego sets produced per year:', df['count_set_year'].agg(np.mean).round(0))