survey example
Table of contents
- Itroduction
- Overview
- Factors
- Prediction
- Recomendations
- Questions from management (short answers)
- Code
- References
1. Introduction
Cancellation of bookings is a big challenge for hotel owners and managers. Their number can reach up to 40% (according to the Spanish consulting company Mirai, which specializes in working with hotels). The risks are especially high when bookings were made through online travel agencies (OTA) such as Booking.com, Expedia, and others.
Clients can cancel bookings, either due to forced circumstances (illness, changes at work) or simply because travel plans or itineraries have changed.
Sometimes customers can make several reservations at different hotels in the same city, so that later, closer to the dates of the trip, they can choose only one of them. Others, in search of a better price, having booked a room in advance, follow the price changes in the hotel itself and competitors (OTA can often offer discounts) to be able to cancel and take a better offer.
All this leads to the fact that some of the hotel rooms are empty: after cancellations (especially those that occur shortly before expected check-in) there is much less chance to find a new client. Especially during the peak period. This in turn leads to financial losses.
To minimize risks it is necessary:
- know in what cases cancellation can be expected, and what factors affect it;
- be able to predict the probability of cancellation of the orders;
- predict the percentage of cancellations for the hotel in a given period;
- develop a strategy to reduce the number of cancellations and associated financial losses based on data analysis.
2. Overview
For analysis, we were provided with data on the operation of the hotel for the period from July 2017 to December 2018. The average cancellation rate was 33%, which is in line with the normal picture in the market (according to Mirai). If we exclude the data for the first month (July 2017), when the number of orders was minimal, then the picture in terms of the cancellation rate will be as follows: in the first "low" season (winter 2017-2018) the rate was minimal and amounted to 2.5%, and during the only one "high" season (summer 2018), presented for analysis, rose to 46%, which should affect the financial performance of the hotel.
At the same time, the beginning of the second "low" season (end of autumn - beginning of winter 2018) differs from what could be observed in 2017: the number of orders exceeded last year's figures, but the percentage of cancellations remained relatively high, falling only by December to below 20%. Thus, in the cancellations data, we observe not only seasonality but also a growing trend. But it must be noted that the data for a year and a half is not enough to confidently speak about seasonality.
3. Factors
3.1 Global factors
The hotel management provided for analysis detailed information (not just "canceled - not canceled" values) about bookings made by guests. The main objective of the study was to identify the factors that most strongly affect the likelihood of order cancellation.
Factors can be divided into three large groups:
- clients information (number of adult guests, number of children, type of booking, whether the guest has visited the hotel before, the number of canceled and not canceled bookings);
- facilities information (type of meal, parking requests, room category, average price per day, number of special requests);
- terms information (number of nights on weekdays, number of nights on weekends, time before check-in at the hotel, day, month, and year when the order was made).
The most important factors are from the second group (facilities information). In second place in importance is terms information. Customer information is the most insignificant.
To get a numerical estimate, we use predictions. If we use all the data given for the study in the prediction, then at 60% prediction precision (out of 10 clients for which the model predicts cancellation, only in four cases we will make an error), the prediction recall will be 71%* (out of 10 clients who actually cancel, the model will be able to point to 7).
If we fix the precision of the prediction model at 60% (we assume it to be wrong by no more than 40% of customers) and we will use only a subset of the features so that we can observe how the recall will decrease:
- only customer information: recall 1%;
- only facilities information: recall 36%;
- only terms information: recall 6%;
- everything except customer information: recall 52%;
- everything except facilities information: recall 9%;
- everything except terms information: recall 52%.
*We obtained this result using an average of 6 monthly predictions (see paragraph "Prediction")
3.2 Particular factors (Top-5)
This top feature is based on the predictions of the model (Catboost): they have the highest weight, which means they contain the most information about the behavior of the client (whether she/he cancels the order or not).
- Number of days since the first hotel visit (was made from lead_time, arrival_year, arrival_month, arrival_date)
- Number of special request (no_of_special_requests)
- Booking type (market_segment_type)
- Number of days before arrival (lead_time)
- Price difference with median room price (was made from avg_price_per_room, room_type_reserved)
3.3 Particular factors (Analysis of the presented data)
3.3.a Number of guests (adults and children)
The main conclusion about this factor: when an adult booked a room for one person, such trips were canceled 1.5 times less than when he was not alone. When traveling with a child, the number of cancellations is 5% higher. This is a very small difference. It can be possible also highlight those cases when the total number of guests was 4 or more. Cancellation rate for such orders: more than 40%.
Markers for hotel managers:
- low risk: alone adult visitor;
- high risk: 4 or more guests.
3.3.b Number of nights at the hotel (weekends and weekdays)
Just looking at the average number of cancellations, it would seem that the number of nights is a simple and well-interpreted factor: the higher the number, the higher the risk of cancellation. Customers who only booked for one night canceled 22% of their orders. On longterm trips (from 10 nights), the number of cancellations reached 60%, and sometimes exceeded the threshold of 80%.
But it is important to remember one thing: there are very few longterm trips.
- more than 85% of bookings are trips lasting 4 nights or less;
- more than 95% of orders are trips lasting less than 1 week.
Therefore, longterm trips ("more than 4 nights, but less than a week" and "more than a week") are more correctly do not divide by days, but make groups from them. Then we get the following picture:
- 1 night - cancellations occurred in 22% of cases;
- 2-4 nights - cancellations occurred in 32-35% of cases;
- more than 4 nights, but less than a week - cancellations occurred in 35% of cases;
- more than a week - cancellations occurred in 46% of cases.
The first group (which contains 18% of guests) and the last group (which contains less than 5% of guests) stand out, but unfortunately they do not account for so many bookings.
Separately, it's possible to observe short trips (1-4 nights), where at least one of the days was a weekend or all days were weekends. The percentage of cancellations in the first case is 34% (versus 30% if there were only working days), the percentage of cancellations in the second case is 30% (against 32% when at least one day was working). Thus, the presence or absence of weekdays in the booking cannot be a marker for managers.
Markers for hotel managers:
- low risk: one night booking;
- high risk: booking for a week or more.
3.3.c Type of meal
This factor is similar in usefulness to the previous one: there is a group of users that differs from the rest ("meal type 2" with 46% of cancellations), but at the same time is small in size - 18% of users. The remaining 92% are divided into groups "meal type 1" and "meal type not selected", where the number of cancellations is much lower - 31-33%. "Meal type 3" is chosen by less than 1% of users.
Markers for hotel managers:
- high risk: "meal type 2" is selected.