Project: Examining the History of Lego Sets

Lego is a household name across the world, supported by a diverse toy line, hit movies, and a series of successful video games. In this project, we are going to explore a key development in the history of Lego: the introduction of licensed sets such as Star Wars, Super Heroes, and Harry Potter.

The introduction of its first licensed series, Star Wars, was a hit that sparked a series of collaborations with more themed sets. The partnerships team has asked you to perform an analysis of this success, and before diving into the analysis, they have suggested reading the descriptions of the two datasets to use, reported below.

The Data

You have been provided with two datasets to use. A summary and preview are provided below.

lego_sets.csv

Column	Description
`"set_num"`	A code that is unique to each set in the dataset. This column is critical, and a missing value indicates the set is a duplicate or invalid!
`"name"`	The name of the set.
`"year"`	The date the set was released.
`"num_parts"`	The number of parts contained in the set. This column is not central to our analyses, so missing values are acceptable.
`"theme_name"`	The name of the sub-theme of the set.
`"parent_theme"`	The name of the parent theme the set belongs to. Matches the name column of the parent_themes csv file.

parent_themes.csv

Column	Description
`"id"`	A code that is unique to every theme.
`"name"`	The name of the parent theme.
`"is_licensed"`	A Boolean column specifying whether the theme is a licensed theme.

# Import pandas, read and inspect the datasets
import pandas as pd

lego_sets = pd.read_csv('data/lego_sets.csv')
lego_sets.head()

parent_themes = pd.read_csv('data/parent_themes.csv')
theme_licensed = parent_themes[parent_themes['is_licensed']==True]
theme_licensed.shape

sets_licensed = lego_sets[lego_sets['parent_theme'].isin(theme_licensed['name'])]
sets_licensed.shape[0]

# Start coding here
# Use as many cells as you need

nb_sets_licensed_starwars = lego_sets[lego_sets['parent_theme']=='Star Wars']

nb_sets_licensed_starwars.shape[0]

the_force = (nb_sets_licensed_starwars.shape[0]*100) / sets_licensed.shape[0]
the_force = int(the_force)
the_force

nb_sets_licensed_starwars

# Group the Star Wars sets by year and count the number of sets released each year
starwars_sets_by_year = nb_sets_licensed_starwars['year'].value_counts()

# Rename the columns for better readability
#starwars_sets_by_year.columns = ['year', 'count']

# Find the year with the highest number of Star Wars sets released
year_with_most_sets = starwars_sets_by_year.idxmax()

year_with_most_sets

new_era = starwars_sets_by_year.idxmax()
new_era

starwars_sets_by_year

nb_sets_licensed_starwars

import seaborn as sns
import matplotlib.pyplot as plt

import seaborn as sns
import pandas as pd

# Assuming nb_sets_licensed_starwars is a DataFrame
year_counts = nb_sets_licensed_starwars['year'].value_counts().reset_index()
year_counts.columns = ['year', 'count']

sns.countplot(x='year', data=year_counts, order=year_counts['year'])

import seaborn as sns
import matplotlib.pyplot as plt

by_year = sns.countplot(x=nb_sets_licensed_starwars['year'], 
                        order=nb_sets_licensed_starwars['year'].value_counts().index, 
                        color='yellow')

by_year.set_xticklabels(nb_sets_licensed_starwars['year'].value_counts().index, rotation=90)
by_year.set_title('nombre de set par année')

# Add the value labels on the bars
for p in by_year.patches:
    by_year.annotate(format(p.get_height(), '.0f'), 
                     (p.get_x() + p.get_width() / 2., p.get_height()), 
                     ha = 'center', va = 'center', 
                     xytext = (0, -7), 
                     textcoords = 'offset points')

plt.show()

Project: Examining the History of Lego Sets

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}The Data

lego_sets.csv

parent_themes.csv

The Data