Skip to content

Hotel booking

  • Travel agents offer advice on destinations, plan trip itineraries, and make travel arrangements for clients. Travel agents sell transportation, lodging, and admission to entertainment activities to individuals and groups planning trips. So the travel company wants to see the cancellation situation difference between resort and city hotel.
  • Read only ‘is_canceled’ and ‘hotels’ columns. Do the task with the help of visualization.

⚕️ Data Information ℹ️

Column NameDescription
hotelIndicates the type of accommodation (e.g., "Resort Hotel" or "City Hotel").
is_canceledShows whether the reservation was canceled (0: Not Canceled, 1: Canceled).
leadtimeRepresents the number of days between the booking date and the arrival date.
arrivalyearIndicates the year of the customer's arrival.
arrivalmonthIndicates the month of the customer's arrival.
staysweekendnightsSpecifies the weekend nights' duration of stay.
staysweeknightsSpecifies the weekdays' duration of stay.
adults, children, babiesSpecifies the number of adults, children, and babies in the stay.
mealIndicates the meal plan chosen by the customer during booking.
countryRepresents the customer's country.
market_segmentSpecifies the market segment (e.g., "Online TA" or "Offline TA/TO").
distribution_channelIndicates how the reservation was distributed (e.g., "Direct" or "Corporate").
is_repeated_guestIndicates whether the customer is a repeated guest (0: No, 1: Yes).
previous_cancellationsSpecifies the number of previous reservation cancellations.
booking_changesSpecifies the number of changes made to the reservation.
deposit_typeSpecifies the payment type for the reservation.
days_in_waiting_listShows the number of days spent on the waiting list.
customer_typeSpecifies the customer type (e.g., "Transient" or "Contract").
adrRepresents the average daily rate.
required_car_parking_spacesSpecifies the number of required parking spaces.
total_of_special_requestsSpecifies the total number of special requests.

Overview 👓🔭🔬

import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
data = pd.read_csv("hotelbookingsinfo.csv")
# General Information
def check_data(data,head=5):
    print("##############HEAD###############")
    print(data.head(5))
    print("##############TAIL###############")
    print(data.tail(head))
    print("##############SHAPE###############")
    print(data.shape)
    print("##############INFO###############")
    print(data.info())
    print("##############COLUMNS###############")
    print(data.columns)
    print("##############INDEX##############")
    print(data.index)
    print("##############DESCRIBE########")
    print(data.describe().T)
    print(" #############NaN############# ")
    print(data.isnull().values.any())
    print(data.isnull().sum())
check_data(data)

🤩🤩🤩🔎Solution and Conclusion🔍🤩🤩🤩

# Select 'is_canceled' and 'hotel' columns
data_selected = data[['is_canceled', 'hotel']]

# Grouping by 'is_canceled' status
grouped_data = data_selected.groupby(['hotel', 'is_canceled']).size().unstack()

# Grouping by 'is_canceled' status
ax = grouped_data.plot(kind='bar', stacked=True, figsize=(10, 6))
ax.set_ylabel("Total Number of Reservations")
ax.set_xlabel("Hotel Type")
ax.set_title("Cancellation Status by Hotel Type")

plt.show()

EXTRA ANALYSIS AND EXPLORATION (BONUS)

1. Handling Missing Values:

# Filling missing values in 'country' with the most frequent country
data['country'].fillna(data['country'].mode()[0], inplace=True)

# Handling 'agent' and 'company' columns based on analysis goals
# For example, filling missing values with 0
data['agent'].fillna(0, inplace=True)
data['company'].fillna(0, inplace=True)

2. Outlier Analysis:

# Outlier analysis for numerical columns
numerical_columns = ['leadtime', 'staysweekendnights', 'staysweeknights', 'adults', 'children', 
                     'babies', 'previous_cancellations', 'booking_changes', 'days_in_waiting_list', 
                     'adr', 'required_car_parking_spaces', 'total_of_special_requests']

# Visualizing boxplots for outlier detection in a single figure
plt.figure(figsize=(15, 10))

for i, column in enumerate(numerical_columns, 1):
    plt.subplot(3, 5, i)
    sns.boxplot(x=data[column])
    plt.title(f'{column}')

plt.tight_layout()
plt.show()

3. Time Series Analysis: