Skip to content

Introduction

In this project we will be importing, cleaning, and analyzing the data from 2023 and buidling a predictive model for 2024 revenue based on booked nights.

Summary:

1. Data Distribution and Trends:

The average amount spent by guests and the average number of nights stayed were calculated, providing a baseline for understanding guest spending behavior. Most guests spent €400 or less, indicating a concentration of short-term, budget-conscious bookings.

2. Spending Patterns:

Guests were categorized into different spending groups, revealing how guest expenditure is distributed. This segmentation helps in understanding different customer profiles. The histogram of the amount spent showed a skewed distribution, suggesting that while most guests spend lower amounts, there is a significant number of higher-value bookings.

3. Guest-Specific Insights:

The analysis identified the guests who spent the most and the least, offering a glimpse into the range of customer spending on the platform.

4. Correlation Between Spending and Nights Stayed:

A moderate positive correlation (0.68) was found between the amount spent and the number of nights stayed, indicating that longer stays tend to correspond with higher expenditures.

5.Predictive Modeling for 2024:

A linear regression model predicted 2024 revenue based on the number of nights booked. The model's prediction indicated a range of expected revenue with a 95% confidence interval, providing a quantified forecast for 2024.

6. Scenario Analysis:

The scenario analysis with 1000 simulations offered a probabilistic view of potential revenue outcomes, enhancing the robustness of the forecast.

7. Evaluation of Model Performance:

The model's performance was assessed using the mean squared error, ensuring the reliability of the predictions.

Task:

  • Import the data from airbnb2023.xlsx,
  • Clean and analyise the data
  • Make a predictive model for 2024. revenue and nights booked.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
# Read the Excel file and store it in the 'airbnb' dataframe
airbnb = pd.read_excel("Airbnb2023.xlsx", index_col=0)

# Display the contents of the 'airbnb' dataframe
display(airbnb)

# Drop the columns 'Unnamed: 9' and 'PDV' from the 'airbnb' dataframe
airbnb = airbnb.drop(["Unnamed: 9","PDV"], axis=1)
airbnb = airbnb.dropna()

# Display information about the airbnb dataframe
airbnb.info()

# Display the updated contents of the 'airbnb' dataframe
airbnb.head()
# Calculate descriptive statistics for the "Amount" column in the airbnb dataframe
airbnb_amount = airbnb["Amount"].describe()
airbnb_amount
# Calculate the descriptive statistics for the "Nights" column in the airbnb dataframe
airbnb_nights = airbnb["Nights"].describe()
airbnb_nights
# Calculate the total amount in the airbnb dataframe
total_amount = airbnb["Amount"].sum()

# Calculate the total nights in the airbnb dataframe
total_nights = airbnb["Nights"].sum()

total_amount, total_nights
# Divide guests into groups based on the amount they spent
airbnb['Amount Spent €'] = pd.cut(airbnb['Amount'], bins=[200,400,600,800,1000, np.inf], labels=['0-200', '200-400', '400-600', '600-800', '1000+'])

# Count the number of guests in each group
guests_per_group = pd.DataFrame(airbnb['Amount Spent €'].value_counts())

# Rename the index column to "Amount Spent" and the second column to "Guests"
guests_per_group.columns = ["Guests number"]
guests_per_group.index.name = "Amount Spent €"

# Display the number of guests in each group
print(guests_per_group)

# Create a bar plot to visualize the number of guests in each group
plt.figure(figsize=(10, 6))
sns.barplot(x=guests_per_group.index, y=guests_per_group["Guests number"])
plt.xlabel("Amount Spent €")
plt.ylabel("Number of Guests")
plt.title("Number of Guests in Each Amount Spent Group")
plt.show()
# Plotting a histogram of the 'Amount' column
plt.figure(figsize=(10, 6))
plt.hist(airbnb['Amount'], bins=5, color='mediumseagreen')
plt.xlabel('Amount')
plt.ylabel('Frequency')

# Add a vertical line for the mean value of the 'Amount' column
plt.axvline(airbnb['Amount'].mean(), color='red', linestyle='dashed', linewidth=2)

# Add a vertical line for the median value of the 'Amount' column
plt.axvline(airbnb['Amount'].median(), color='blue', linestyle='dashed', linewidth=2)

# Add a legend for the mean and median lines
plt.legend(['Mean', 'Median'])

# Display the plot
plt.show()
  • Most guests spent 400 or less € during their stay
# Group the data by guest and calculate the sum of amount
guest_data = airbnb.groupby('Guest').agg({'Amount': 'sum', 'Nights': 'sum'})

# Convert the guest names to integers
guest_data.index = guest_data.index.astype(str)

# Create a figure and axis
fig, ax1 = plt.subplots(figsize=(20, 10))

# Plot the guest data as a bar chart with explicit x-axis ticks starting from 0
x_ticks = range(len(guest_data))
ax1.bar(x_ticks, guest_data['Amount'], color='mediumseagreen', tick_label=guest_data.index)
ax1.set_ylabel('Amount')

# Set the x-axis labels to be rotated
ax1.set_xticks(x_ticks)
ax1.set_xticklabels(guest_data.index, rotation=60, ha="right")

# Create a second y-axis for nights stayed
ax2 = ax1.twinx()
ax2.plot(x_ticks, guest_data['Nights'], marker='o', color='red')
ax2.set_ylabel('Nights')

# Set the title
plt.title('Amount Spent and Nights Stayed by Guest')

# Adjust the layout to prevent overlapping labels
plt.tight_layout()

plt.show()

The guest that spent the most money:

# Find the guest who spent the most money
most_spent_guest = guest_data['Amount'].idxmax()

# Get the amount spent by the guest
most_spent_amount = guest_data.loc[most_spent_guest, 'Amount']

most_spent_guest, most_spent_amount

The guest that spent the least money: