Skip to content
London Public Transport In-depth Analysis
London Public Transport in depth Analysis
1. Understanding the data
1. Understanding the Structure and Content of the Table
First, we will inspect the structure and some sample content of the table.
DataFrameas
df
variable
SELECT *
FROM TFL.JOURNEYS
LIMIT 10;2. Data Cleaning
Check for and handle missing values, duplicates, and outliers.
Total rows in the journeys table
DataFrameas
df1
variable
SELECT
COUNT(*) AS total_rows
FROM TFL.JOURNEYS;Check for Missing Values
DataFrameas
df2
variable
SELECT
COUNT(*) AS total_rows,
COUNT(month),
COUNT(year),
COUNT(days),
COUNT(report_date),
COUNT(journey_type),
COUNT(journeys_millions),
FROM TFL.JOURNEYS;The journeys_millions column in journeys table have 95 Null values.
Check for duplicate records(rows)
DataFrameas
df3
variable
SELECT DISTINCT COUNT(*) DISTINCT
FROM TFL.JOURNEYS;The above query returned 936 Distinct rows which is equals to total rows in journeys table. Hence there are no duplicates in the table.
2. Exploratory Data Analysis
1. Descriptive Statistics