Q1: What is the average number of sets per year?
To answer this query, a temporary view sets_per_year is defined. The view contains a record for each year with existing sets, and the number of sets, num_sets_per_year, produced in that year. The average number of sets per year is computed using this temporary view.
WITH sets_per_year AS
(SELECT year,
COUNT(*) AS num_sets_per_year
FROM sets
GROUP BY year)
SELECT *
FROM sets_per_year WITH sets_per_year AS
(SELECT year,
COUNT(*) AS num_sets_per_year
FROM sets
GROUP BY year)
SELECT ROUND(AVG(num_sets_per_year), 2) AS avg_num_sets_per_year
FROM sets_per_year
Q2: What is the average number of parts per year?
The average number of parst per year is 1056.5.
WITH parts_per_year AS
(SELECT s.year, COUNT(DISTINCT p.part_num) as parts_per_year
FROM parts p JOIN inventory_parts ip ON p.part_num = ip.part_num
JOIN inventories iv ON ip.inventory_id = iv.id
JOIN sets s ON s.set_num = iv.set_num
GROUP BY s.year
)
SELECT ROUND(AVG(parts_per_year), 2) AS avg_parts_per_year
FROM parts_per_year
WITH parts_per_year AS
(SELECT s.year, COUNT(DISTINCT p.part_num) AS parts_per_year
FROM parts p JOIN inventory_parts ip ON p.part_num = ip.part_num
JOIN inventories iv ON ip.inventory_id = iv.id
JOIN sets s ON s.set_num = iv.set_num
GROUP BY s.year
)
SELECT year, AVG(parts_per_year) AS avg_parts_per_year
FROM parts_per_year
GROUP BY yearQ3: What is the average number of parts per year? Create a visual.
From the line graph below, it can be noticed that the average number of parts per year has increased consistently year after year. A few noticeable drops happened in 2005, 2007 and 2017, with the one in 2017 being the largest drop. This last drop, from 4297 to 3013 is significant, nearing a 25% drop.
Parts per Year
Q4: What are the 5 most popular colors used in Lego parts?
The first step to answering this question is providing some clarity to the meaning of popular color. Based on the available information, a color is popular either
- when there are many different parts in that color existing in the inventory (a), or
- when the quantity of parts is large in that particular color (b).
It is unclear to me which one of these interpretations of the popular color is preferred, so a solution is presented for each of them.
Note that even though the colors table has the attribute id as the unique identifier, the name attribute is also unique among the records of the colors table. To verify this, the following queries were executed:
SELECT COUNT(DISTINCT name) FROM colors, andSELECT COUNT(*) FROM colorsBoth queries return the same result,135.
The two solutions presented below return the results for the top 5 favourite colors. A visual inspection of the two charts reveals that the ratios are very similar between them.
SELECT c.name, COUNT(ip.part_num) AS num_parts
FROM colors c JOIN inventory_parts ip ON c.id = ip.color_id
GROUP BY c.name
ORDER BY num_parts DESC
LIMIT 5
Top 5 favorite colors for parts (Solution (a))
SELECT c.name, SUM(ip.quantity) AS num_parts
FROM colors c JOIN inventory_parts ip ON c.id = ip.color_id
GROUP BY c.name
ORDER BY num_parts DESC
LIMIT 5Top 5 favorite colors for parts (Solution (b))
Q5: What proportion of the lego parts are transparent?
A transparent part is one that has a transparent color, indicated by the is_trans attribute of the colors table. The same part (part_num) can be ordered in various colors, either transparent or not. Note that there are 1455 parts that are not part of the inventory_parts, so no conclusion can be drawn with respect to these ones.
To answer this question, either the quantity of the parts or the number of distinct parts can be used as measure. If the quantities of the parts is used, the following formula is applied:
Alternatively, if the number of distinct parts is used, the following formula is applied:
Two solutions are presented, one for each of the formulas above. Interestingly, although each solution returns a different result, they are very close in range. Since a part can be ordered in various colors, the difference in the two numbers can be explained as an overcounting of the parts.
- When the first formula is used,
4.94%of the Lego parts are transparent. - When the second formula is used,
6.29%of the Lego parts are transparent.
When the second formula is used, the parts can be counted in different ways: considering their color, or disregarding the color. That is, part (A, red) is different than part (A, blue). When the counting is done in this way, a different final numbers are obtained, 6.11%. Note that 6.11% and 6.29% are very close in range, indicating that overcounting is less prevalent in this case.
WITH quantity_all AS
(SELECT SUM(quantity) AS total
FROM inventory_parts
),
quantity_transparent_parts AS
(SELECT SUM(ip.quantity) AS trans_parts
FROM colors c JOIN inventory_parts ip ON c.is_trans = True AND c.id = ip.color_id
)
SELECT ROUND(trans_parts * 100.0/total, 2)
FROM quantity_all, quantity_transparent_parts