Un lego perdido en el tiempo

@Fuhrerhlemon @hewelfosati @oliveryasiel321

[tags 🔖]
#📊DataAnalyst #🐘PostgreSQL #🐍Python #🐼Pandas #😎Plotly

🗒️ Note: 💪Create a report to summarize your findings. Include:

What is the average number of Lego sets released per year?

What is the average number of Lego parts per year?

Create a visualization for item 2.

What are the 5 most popular colors used in Lego parts?

[Optional] What proportion of Lego parts are transparent?

[Optional] What are the 5 rarest lego bricks?

Summarize your findings.

Understanding Lego sets popularity

Now let's now move on to the competition and challenge.

📖 Background

You recently applied to work as a data analyst intern at the famous Lego Group in Denmark. As part of the job interview process, you received the following take-home assignment:

You are asked to use the provided dataset to understand the popularity of different Lego sets and themes. The idea is to become familiarized with the data to be ready for an interview with a business stakeholder.

💾 The data

You received access to a database with the following tables. You can also see above a visualization of how the tables are related to each other. (source):

inventory_parts

"inventory_id" - id of the inventory the part is in (as in the inventories table)
"part_num" - unique id for the part (as in the parts table)
"color_id" - id of the color
"quantity" - the number of copies of the part included in the set
"is_spare" - whether or not it is a spare part

parts

"part_num" - unique id for the part (as in the inventory_parts table)
"name" - name of the part
"part_cat_id" - part category id (as in part_catagories table)

part_categories

"id" - part category id (as in parts table)
"name" - name of the category the part belongs to

colors

"id" - id of the color (as in inventory_parts table)
"name" - color name
"rgb" - rgb code of the color
"is_trans" - whether or not the part is transparent/translucent

inventories

"id" - id of the inventory the part is in (as in the inventory_sets and inventory_parts tables)
"version" - version number
"set_num" - set number (as in sets table)

inventory_sets

"inventory_id" - id of the inventory the part is in (as in the inventories table)
"set_num" - set number (as in sets table)
"quantity" - the quantity of sets included

sets

"set_num" - unique set id (as in inventory_sets and inventories tables)
"name" - the name of the set
"year" - the year the set was published
"theme_id" - the id of the theme the set belongs to (as in themes table)
num-parts - the number of parts in the set

themes

"id" - the id of the theme (as in the sets table)
"name" - the name of the theme
"parent_id" - the id of the larger theme, if there is one

Acknowledgments: Rebrickable.com

🐍 Importing Modules

import pandas as pd
import plotly.express as px
from plotly import graph_objects as go
from plotly.subplots import make_subplots

Data analysis

1. What is the average number of Lego sets released per year?

DataFrameas

lego_per_year

variable

select * from public.sets

avg_per_years = lego_per_year.groupby('year')['set_num'].agg('count').reset_index()
avg_per_years = avg_per_years.sort_values(('year'), ascending=False)
avg_per_years

fig = px.histogram(avg_per_years, x='year', y='set_num',
                   orientation='v'
                   ).update_layout(
    title={'text': '𓏠 Average number of Lego sets released per year'},
    yaxis_title='AVG_SET_PER_YEAR', xaxis_title='YEAR'
    )

fig.show()

2. What is the average number of Lego parts per year?

avg_per_part = lego_per_year.groupby('year')['num_parts'].agg('mean').reset_index()
avg_per_part = avg_per_part.sort_values(('year'), ascending=False)
avg_per_part

fig = px.bar(avg_per_part, x='year', y='num_parts',
             orientation='v'
            ).update_layout(
    title={'text': '𓏠 Average number of Lego parts per year'},
    yaxis_title='AVG_PART_PER_YEAR', xaxis_title='YEAR'
    )
fig.show()

‌
‌
‌

Un lego perdido en el tiempo

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Un lego perdido en el tiempo

Understanding Lego sets popularity

📖 Background

💾 The data

You received access to a database with the following tables. You can also see above a visualization of how the tables are related to each other. (source):

inventory_parts

parts

part_categories

colors

inventories

inventory_sets

sets

themes

🐍 Importing Modules

Data analysis

1. What is the average number of Lego sets released per year?

2. What is the average number of Lego parts per year?

Un lego perdido en el tiempo