Predicting hotel cancellation

Predicting Hotel Cancellations

🏨 Background

You are supporting a hotel with a project aimed to increase revenue from their room bookings. They believe that they can use data science to help them reduce the number of cancellations. This is where you come in!

They have asked you to use any appropriate methodology to identify what contributes to whether a booking will be fulfilled or cancelled. They intend to use the results of your work to reduce the chance someone cancels their booking.

The Data

They have provided you with their bookings data in a file called hotel_bookings.csv, which contains the following:

Column	Description
`Booking_ID`	Unique identifier of the booking.
`no_of_adults`	The number of adults.
`no_of_children`	The number of children.
`no_of_weekend_nights`	Number of weekend nights (Saturday or Sunday).
`no_of_week_nights`	Number of week nights (Monday to Friday).
`type_of_meal_plan`	Type of meal plan included in the booking.
`required_car_parking_space`	Whether a car parking space is required.
`room_type_reserved`	The type of room reserved.
`lead_time`	Number of days before the arrival date the booking was made.
`arrival_year`	Year of arrival.
`arrival_month`	Month of arrival.
`arrival_date`	Date of the month for arrival.
`market_segment_type`	How the booking was made.
`repeated_guest`	Whether the guest has previously stayed at the hotel.
`no_of_previous_cancellations`	Number of previous cancellations.
`no_of_previous_bookings_not_canceled`	Number of previous bookings that were canceled.
`avg_price_per_room`	Average price per day of the booking.
`no_of_special_requests`	Count of special requests made as part of the booking.
`booking_status`	Whether the booking was cancelled or not.

Source (data has been modified): https://www.kaggle.com/datasets/ahsan81/hotel-reservations-classification-dataset

import pandas as pd
hotels = pd.read_csv("data/hotel_bookings.csv")
#hotels = pd.read_csv("Hotel Reservations.csv")
hotels

hotels.columns

hotels.info()

hotels.describe()

filling na variable for non-cat Data

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from yellowbrick.classifier import ConfusionMatrix
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import r2_score
from sklearn.metrics import mean_absolute_error, mean_squared_error
from sklearn.model_selection import GridSearchCV

create useful statistics

create model for prediction

comparing different models

import matplotlib.pyplot as plt
import seaborn as sns
import datetime as dt

from sklearn.preprocessing import OneHotEncoder, LabelEncoder, label_binarize
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score, confusion_matrix
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from xgboost import XGBClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import roc_auc_score, roc_curve
from sklearn.metrics import RocCurveDisplay

hotels.isnull().sum()

# Import SimpleImputer
from sklearn.impute import SimpleImputer

colss_cat = ['type_of_meal_plan','room_type_reserved','market_segment_type']
colss_numeric = ['no_of_adults', 'no_of_children', 'no_of_weekend_nights',
       'no_of_week_nights', 'required_car_parking_space','lead_time', 'arrival_year', 'arrival_month',
       'arrival_date','repeated_guest','no_of_previous_cancellations', 'no_of_previous_bookings_not_canceled',
       'avg_price_per_room', 'no_of_special_requests']

‌
‌
‌

Predicting hotel cancellation

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Predicting Hotel Cancellations

🏨 Background

The Data

Predicting Hotel Cancellations