Skip to content
New Workbook
Sign up
Python Exercise: Lego Sets x Star Wars

Jump to Exercise (Invalid URL)

Background

  • Lego is a household name across the world, supported by a diverse toy line, hit movies, and a series of successful video games. In this project, we are going to explore a key development in the history of Lego: the introduction of licensed sets such as Star Wars, Super Heroes, and Harry Potter.
  • The introduction of its first licensed series, Star Wars, was a hit that sparked a series of collaborations with more themed sets. The partnerships team has asked you to perform an analysis of this success, and before diving into the analysis, they have suggested reading the descriptions of the two datasets to use, reported below.

The Data

You have been provided with two datasets to use. A summary and preview are provided below.

lego_sets.csv

ColumnDescription
set_numA code that is unique to each set in the dataset. This column is critical, and a missing value indicates the set is a duplicate or invalid!
nameThe name of the set.
yearThe date the set was released.
num_partsThe number of parts contained in the set. This column is not central to our analyses, so missing values are acceptable.
theme_nameThe name of the sub-theme of the set.
parent_themeThe name of the parent theme the set belongs to. Matches the name column of the parent_themes csv file.

parent_themes.csv

ColumnDescription
idA code that is unique to every theme.
nameThe name of the parent theme.
is_licensedA Boolean column specifying whether the theme is a licensed theme.

Exercise

All Lego Sets

These are the first steps:

  • import libraries and standardise appearance
  • read the requisite files
  • merge the dataframes (dfs)
  • drop nulls

Importing and cleaning

# import libraries
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib as mpl

# standardise appearance
star_wars_palette = ["#1E90FF", "#FFD700", "#228B22", "#DC143C", "#8A2BE2"]
sns.set_palette(star_wars_palette)
mpl.rcParams['axes.titleweight'] = 'bold'
mpl.rcParams['figure.titleweight'] = 'bold'
mpl.rcParams['font.weight'] = 'regular'
mpl.rcParams['axes.labelweight'] = 'regular'
mpl.rcParams['text.color'] = '#42423E'
mpl.rcParams['axes.labelcolor'] = '#151515'
mpl.rcParams['xtick.color'] = '#CECED0'
mpl.rcParams['ytick.color'] = '#878d8f'
mpl.rcParams['axes.titlesize'] = 12
mpl.rcParams['axes.labelsize'] = 10
mpl.rcParams['figure.figsize'] = (10, 5)
sns.set_style("darkgrid")

# read the files
parent_themes = pd.read_csv('data/parent_themes.csv')

# check for duplicates
print("Are rows in parent_themes unique: ", parent_themes.id.is_unique)

# rename columns to eliminte ambiguaty
parent_themes = parent_themes.add_prefix('theme_')

# preview the data
parent_themes.head()
Hidden code
Hidden code
Hidden code

Licensed Themed LEGO® Sets

# subset merged lego to create licensed_legos
licensed_legos = merged_lego[merged_lego['theme_is_licensed'] == True]

# drop unwanted columns
licensed_legos = licensed_legos.drop(columns=['theme_is_licensed'])

licensed_length = len(licensed_legos)
all_length = len(merged_lego)
pc_licensed = round(licensed_length/all_length*100,0)

print('There are', licensed_length, 'licensed themed LEGO sets, or ',pc_licensed, 'per cent of this dataset')

# preview the data
licensed_legos.head()
import plotly.express as px

# Define the color palette
star_wars_palette = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', 
                     '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf']

# get parent theme counts and set order
pt_order = licensed_legos['theme_name'].value_counts().head(15).index

# filter the dataframe to include only the top 15 themes
top_15_licensed_legos = licensed_legos[licensed_legos['theme_name'].isin(pt_order)]

# plot using plotly
fig = px.bar(top_15_licensed_legos, 
             y='theme_name', 
             title='Count of Licensed LEGO\u00AE Sets by Parent Theme - Top 15', 
             labels={'theme_name': 'Theme of LEGO set', 'count': 'Count of Lego sets'},
             orientation='h',
             color_discrete_sequence=[star_wars_palette[0]])

fig.update_layout(yaxis={'categoryorder':'total ascending'})
fig.show()
# get parent theme counts and set order
pt_order = licensed_legos['theme_name'].value_counts().head(15).index

# plot countplot, sort by count
plt.figure()
sns.set_style('darkgrid')
#sns.set_context('paper')
sns.countplot(y='theme_name', data=licensed_legos, order=pt_order)
plt.title('Count of Licensed LEGO\u00AE Sets by Parent Theme - Top 15')
plt.xlabel('Count of Lego sets')
plt.ylabel('Theme of LEGO set')
plt.show()
# get theme counts and set order
pt_order = merged_lego['theme_name'].value_counts().head(15).index

# plot countplot, sort by count
plt.figure()
sns.set_style('darkgrid')
sns.countplot(y='theme_name', data=merged_lego, order=pt_order, hue='theme_is_licensed', palette={False: '#878d8f', True: '#1E90FF'})
plt.title('Count of top 15 LEGO\u00AE Sets by Theme - Licensed and Unlicensed ')
plt.xlabel('Count of Lego sets')
plt.ylabel('Theme of LEGO set')
plt.legend(title='License Status', labels=['Unlicensed', 'Licensed'])
plt.show()