Lego: Data Analysis

📌 Project Overview

Title: "SQL Data Cleaning, Maintenance, and Visualization using the LEGO Dataset"

Objective

This project aims to demonstrate data engineering and analytical skills using PostgreSQL. We will:

Clean the data – handle missing values, duplicates, and inconsistencies.
Optimize and maintain the database – create indexes, use triggers, and implement stored procedures.
Perform exploratory data analysis (EDA) – generate reports and insights using SQL queries.
Prepare data for visualization – export data for Power BI/Tableau dashboards.

Unknown table

📌 Step 1: Exploring the Database

🎯 Goal: Before running any query, we need to understand the structure of the dataset.

🔍 Query Explanation: We'll list all available tables in the database.

DataFrameas

df

variable

-- List all tables in the database
SELECT table_name 
FROM information_schema.tables 
WHERE table_schema = 'public';

📌 Step 2: Identify Relationships Between Tables

We need to check how tables are connected using foreign keys.

🔍 Query Explanation This query finds foreign key relationships in the database.

DataFrameas

df2

variable

SELECT 
    conrelid::regclass AS table_name, 
    a.attname AS column_name, 
    confrelid::regclass AS referenced_table, 
    af.attname AS referenced_column
FROM pg_constraint c
JOIN pg_attribute a ON a.attnum = ANY(c.conkey) AND a.attrelid = c.conrelid
JOIN pg_attribute af ON af.attnum = ANY(c.confkey) AND af.attrelid = c.confrelid
WHERE c.contype = 'f';

📌 Step 3: Exploring the Data

Before cleaning, we need to analyze:

The number of rows in each table. A sample of the data to identify inconsistencies. Potential missing values. 🔍 Query: Count Rows in Each Table 🎯 Goal: Get an overview of how much data we have in each table.

DataFrameas

df3

variable

-- Count the number of rows in each table
SELECT 'sets' AS table_name, COUNT(*) AS row_count FROM sets
UNION ALL
SELECT 'themes', COUNT(*) FROM themes
UNION ALL
SELECT 'parts', COUNT(*) FROM parts
UNION ALL
SELECT 'part_categories', COUNT(*) FROM part_categories
UNION ALL
SELECT 'colors', COUNT(*) FROM colors
UNION ALL
SELECT 'inventories', COUNT(*) FROM inventories
UNION ALL
SELECT 'inventory_parts', COUNT(*) FROM inventory_parts
UNION ALL
SELECT 'inventory_sets', COUNT(*) FROM inventory_sets;

📌 Step 4: Check for Missing Values

We now check for NULL values in all tables.

🎯 Goal: Identify columns with missing data.

Hidden code df4

📌 Step 5: Checking for Duplicate Entries

Since primary keys should be unique, let's verify that:

set_num in sets is unique. id in themes is unique. part_num in parts is unique. id in inventories is unique. 🔍 Query: Find Duplicate Primary Keys

DataFrameas

df5

variable

-- Check for duplicate primary keys in major tables
SELECT 'sets' AS table_name, set_num::TEXT AS duplicate_value, COUNT(*) AS count
FROM sets 
GROUP BY set_num 
HAVING COUNT(*) > 1

UNION ALL

SELECT 'themes', id::TEXT, COUNT(*) 
FROM themes 
GROUP BY id 
HAVING COUNT(*) > 1

UNION ALL

SELECT 'parts', part_num::TEXT, COUNT(*) 
FROM parts 
GROUP BY part_num 
HAVING COUNT(*) > 1

UNION ALL

SELECT 'inventories', id::TEXT, COUNT(*) 
FROM inventories 
GROUP BY id 
HAVING COUNT(*) > 1;

📌 Next Step: Start Data Analysis

Now that we know the data is clean and structured, we can move to exploratory analysis.

📌 What are the Most Popular LEGO Themes?

🎯 Goal: Find the top 10 most released LEGO themes based on the number of sets.

‌
‌
‌

Lego: Data Analysis

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}📌 Project Overview

Title: "SQL Data Cleaning, Maintenance, and Visualization using the LEGO Dataset"

Objective

📌 Step 1: Exploring the Database

📌 Step 2: Identify Relationships Between Tables

📌 Step 3: Exploring the Data

📌 Step 4: Check for Missing Values

📌 Step 5: Checking for Duplicate Entries

📌 Next Step: Start Data Analysis

📌 What are the Most Popular LEGO Themes?

📌 Project Overview