Skip to content
EDA in Python for Absolute Beginners
In this live training, we'll be doing Exploratory Data Analysis, or EDA, on a dataset that consists of hotel booking data. It includes many details about the bookings, including room specifications, the length of stay, the time between the booking and the stay, whether the booking was canceled, and how the booking was made. The data was gathered between July 2015 and August 2017. You can consult the appendices at the bottom of the notebook for citations and an overview of all variables.
# Import the required packages
Import the data
# Import hotel_bookings_clean_v2.csv
Basic exploration
# Show dimensions
# Are there missing values?
# Describe with summary statistics
# How many bookings were canceled?
Are the cancellation rates different during different times of the year?
# Calculate and plot cancellations every month
cancellations = df\
.filter(['arrival_date_month', 'is_canceled'])\
.groupby(by = 'arrival_date_month', as_index=False)\
.sum()
# Create bar chart of cancellations per month
# Calculate and plot total bookings every month
# Create bar chart of total bookings per month
# Calculate cancellation rates every month
# Create bar chart of cancellation rate every month
px.bar(merged, x='arrival_date_month', y='pct_canceled')
Does the amount of nights influence the cancellation rate?
# Prepare the data