London Public Transport In-depth Analysis

London Public Transport in depth Analysis

1. Understanding the data

1. Understanding the Structure and Content of the Table

First, we will inspect the structure and some sample content of the table.

DataFrameas

df

variable

SELECT *
FROM TFL.JOURNEYS
LIMIT 10;

2. Data Cleaning

Check for and handle missing values, duplicates, and outliers.

Total rows in the journeys table

DataFrameas

df1

variable

SELECT
	COUNT(*) AS total_rows
FROM TFL.JOURNEYS;

Check for Missing Values

DataFrameas

df2

variable

SELECT
	COUNT(*) AS total_rows,
	COUNT(month),
	COUNT(year),
	COUNT(days),
	COUNT(report_date),
	COUNT(journey_type),
	COUNT(journeys_millions),
FROM TFL.JOURNEYS;

The journeys_millions column in journeys table have 95 Null values.

Check for duplicate records(rows)

DataFrameas

df3

variable

SELECT DISTINCT COUNT(*) DISTINCT
FROM TFL.JOURNEYS;

The above query returned 936 Distinct rows which is equals to total rows in journeys table. Hence there are no duplicates in the table.

2. Exploratory Data Analysis

1. Descriptive Statistics

‌
‌
‌

London Public Transport In-depth Analysis

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}London Public Transport in depth Analysis

1. Understanding the data

1. Understanding the Structure and Content of the Table

2. Data Cleaning

2. Exploratory Data Analysis

1. Descriptive Statistics

London Public Transport in depth Analysis