Predicting Hotel Cancellations
🏨 Background
You are supporting a hotel with a project aimed to increase revenue from their room bookings. They believe that they can use data science to help them reduce the number of cancellations. This is where you come in!
They have asked you to use any appropriate methodology to identify what contributes to whether a booking will be fulfilled or cancelled. They intend to use the results of your work to reduce the chance someone cancels their booking.
The Data
They have provided you with their bookings data in a file called hotel_bookings.csv, which contains the following:
| Column | Description |
|---|---|
Booking_ID | Unique identifier of the booking. |
no_of_adults | The number of adults. |
no_of_children | The number of children. |
no_of_weekend_nights | Number of weekend nights (Saturday or Sunday). |
no_of_week_nights | Number of week nights (Monday to Friday). |
type_of_meal_plan | Type of meal plan included in the booking. |
required_car_parking_space | Whether a car parking space is required. |
room_type_reserved | The type of room reserved. |
lead_time | Number of days before the arrival date the booking was made. |
arrival_year | Year of arrival. |
arrival_month | Month of arrival. |
arrival_date | Date of the month for arrival. |
market_segment_type | How the booking was made. |
repeated_guest | Whether the guest has previously stayed at the hotel. |
no_of_previous_cancellations | Number of previous cancellations. |
no_of_previous_bookings_not_canceled | Number of previous bookings that were canceled. |
avg_price_per_room | Average price per day of the booking. |
no_of_special_requests | Count of special requests made as part of the booking. |
booking_status | Whether the booking was cancelled or not. |
Source (data has been modified): https://www.kaggle.com/datasets/ahsan81/hotel-reservations-classification-dataset
import pandas as pd
hotels = pd.read_csv("data/hotel_bookings.csv")
hotelsThe Challenge
- Use your skills to produce recommendations for the hotel on what factors affect whether customers cancel their booking.
SELECT booking_status, COUNT(*) AS total, total/36275*100 AS percent_of_all_bookings
FROM hotels
GROUP BY booking_status;SELECT no_of_children, count(*) AS guests, guests/36275*100 AS percent_of_total
FROM hotels
GROUP BY no_of_children
ORDER BY percent_of_total DESC;The majority of guests who made bookings at this hotel (91.73%) reported having zero children on their room reservation.
SELECT booking_status, no_of_children, COUNT(*) AS total
FROM hotels
WHERE booking_status = 'Canceled'
GROUP BY booking_status, no_of_children
ORDER BY no_of_children DESC;SELECT type_of_meal_plan, count(*) AS total, total/36275*100 AS percent_of_all_bookings
FROM hotels
WHERE booking_status='Canceled'
GROUP BY type_of_meal_plan, booking_status
ORDER BY percent_of_all_bookings DESC;Guests who selected Meal Plan 1 were the most likely to cancel their reservation. 23.55% of guests who booked a reservation and selected Meal Plan 1 canceled their reservation.