Predicting Hotel Cancellations
🏨 Background
You are supporting a hotel with a project aimed to increase revenue from their room bookings. They believe that they can use data science to help them reduce the number of cancellations. This is where you come in!
They have asked you to use any appropriate methodology to identify what contributes to whether a booking will be fulfilled or cancelled. They intend to use the results of your work to reduce the chance someone cancels their booking.
The Data
They have provided you with their bookings data in a file called hotel_bookings.csv
, which contains the following:
Column | Description |
---|---|
Booking_ID | Unique identifier of the booking. |
no_of_adults | The number of adults. |
no_of_children | The number of children. |
no_of_weekend_nights | Number of weekend nights (Saturday or Sunday). |
no_of_week_nights | Number of week nights (Monday to Friday). |
type_of_meal_plan | Type of meal plan included in the booking. |
required_car_parking_space | Whether a car parking space is required. |
room_type_reserved | The type of room reserved. |
lead_time | Number of days before the arrival date the booking was made. |
arrival_year | Year of arrival. |
arrival_month | Month of arrival. |
arrival_date | Date of the month for arrival. |
market_segment_type | How the booking was made. |
repeated_guest | Whether the guest has previously stayed at the hotel. |
no_of_previous_cancellations | Number of previous cancellations. |
no_of_previous_bookings_not_canceled | Number of previous bookings that were canceled. |
avg_price_per_room | Average price per day of the booking. |
no_of_special_requests | Count of special requests made as part of the booking. |
booking_status | Whether the booking was cancelled or not. |
Source (data has been modified): https://www.kaggle.com/datasets/ahsan81/hotel-reservations-classification-dataset
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import pointbiserialr
from sklearn.preprocessing import LabelEncoder
from scipy.stats import chisquare
from scipy.stats import chi2_contingency
from scipy.stats import mannwhitneyu
from datetime import datetime
hotels = pd.read_csv("data/hotel_bookings.csv")
hotels
hotels['booking_status'].value_counts()
The Challenge
Use your skills to produce recommendations for the hotel on what factors affect whether customers cancel their booking:
- What factors affect whether customers cancel their booking?
- Are cancellations more likely during weekends?
- Which general recommendations for the hotel can you make?
Note:
To ensure the best user experience, we currently discourage using Folium and Bokeh in Workspace notebooks.
Checklist before publishing
- Rename your workspace to make it descriptive of your work. N.B. you should leave the notebook name as notebook.ipynb.
- Remove redundant cells like the judging criteria, so the workbook is focused on your work.
- Check that all the cells run without error.
Time is ticking. Good luck!
Based on the given dataset, there are several factors that could potentially affect whether customers cancel their booking at the hotel. Here are some recommendations:
-
Booking Timing and History: Analyze the impact of booking timing, including lead time (number of days before the arrival date the booking was made), and previous booking history, including whether the guest is a repeated guest, number of previous cancellations, and number of previous bookings that were not canceled, on cancellations. Advanced bookings and a history of cancellations may increase the likelihood of future cancellations. The hotel could consider offering incentives for customers who book further in advance, implementing a flexible cancellation policy for bookings made with short lead times, and implementing targeted communication and personalized offers to encourage repeated guests to honor their bookings and minimize cancellations.
-
Market Segment Type: Examine the market segment type, which indicates how the booking was made. Different market segments may have varying cancellation behavior. For example, bookings made through online travel agencies (OTA) may have higher cancellation rates compared to direct bookings. The hotel could tailor their cancellation policies and communication strategies based on the market segment type to minimize cancellations.
-
Average Price per Room: Consider the average price per room of the booking. Higher-priced bookings may be more likely to be canceled if customers find better deals elsewhere or experience changes in their travel plans. The hotel could analyze pricing strategies and competitive pricing in the market to ensure their prices are competitive and attractive to customers, thereby reducing cancellations.
-
Special Requests: Analyze the count of special requests made as part of the booking. Customers who make special requests may have specific preferences or requirements for their stay, and fulfilling these requests could increase customer satisfaction and reduce the likelihood of cancellations. The hotel could prioritize special requests and communicate proactively with guests to ensure their needs are met.
-
Type of Meal Plan, Car Parking Space, and Room Type: Analyze the impact of the type of meal plan included in the booking, whether a car parking space is required, and the type of room reserved on cancellations. Customers who have specific preferences for meal plans, car parking, and room type may be more likely to cancel if their preferences are not met. The hotel could offer a variety of options to accommodate different preferences and ensure accurate bookings, thereby reducing cancellations.
-
Arrival Date and Month: Consider the arrival date and month for bookings. Seasonal factors, events, or holidays may influence cancellations. For example, bookings during peak holiday seasons may have higher cancellation rates due to changing travel plans. The hotel could analyze historical data and trends to identify patterns and adjust their cancellation policies and communication strategies accordingly.
By analyzing and understanding these factors, the hotel can make informed decisions and implement strategies to reduce cancellations and increase revenue from room bookings
Data Exploration and Preparation
We will use some simple commands to get the big picture of our database.
# Print shape of the dataframe
print("Shape of the dataframe: ", hotels.shape)
print("-----------------------------------------------------------------------------------------")
# Print number of unique values in each column
print("Number of unique values in each column (excluding NaNs): ")
print("-----------------------------------------------------------------------------------------")
print(hotels.nunique(dropna=True))
print("-----------------------------------------------------------------------------------------")
# Print information about the dataframe
print("Information about the dataframe: ")
print("-----------------------------------------------------------------------------------------")
print(hotels.info())
We will search for null values and remove them to work with useful data
# Use isna() or isnull() method to check for NaN values
nan_values = hotels.isna()
# You can also use isna().sum() or isnull().sum() to get the count of NaN values in each column
nan_values_count = hotels.isna().sum()
print("\nCount of NaN values in 'hotels' DataFrame:")
print("-----------------------------------------------------------------------------------------")
print(nan_values_count)
# Use dropna() method to drop rows with NaN values
hotels = hotels.dropna()
# Print statement before the output
print("Printing the updated 'hotels' DataFrame after dropping rows with NaN values:")
# Print the updated 'hotels' DataFrame after dropping rows with NaN values
display(hotels.head())
# Print statement before the output
print("Printing the shape of the 'hotels' DataFrame:")
print("-----------------------------------------------------------------------------------------")
print(hotels.shape)