Practical Exam: Hotel Operations
LuxurStay Hotels is a major, international chain of hotels. They offer hotels for both business and leisure travellers in major cities across the world. The chain prides themselves on the level of customer service that they offer.
However, the management has been receiving complaints about slow room service in some hotel branches. As these complaints are impacting the customer satisfaction rates, it has become a serious issue. Recent data shows that customer satisfaction has dropped from the 4.5 rating that they expect.
You are working with the Head of Operations to identify possible causes and hotel branches with the worst problems.
Data
The following schema diagram shows the tables available. You have only been provided with data where customers provided a feedback rating.
Task 1
Before you can start any analysis, you need to confirm that the data is accurate and reflects what you expect to see.
It is known that there are some issues with the branch table, and the data team have provided the following data description.
Write a query to return data matching this description. You must match all column names and description criteria.
| Column Name | Criteria |
|---|---|
| id | Nominal. The unique identifier of the hotel. Missing values are not possible due to the database structure. |
| location | Nominal. The location of the particular hotel. One of four possible values, 'EMEA', 'NA', 'LATAM' and 'APAC'. Missing values should be replaced with “Unknown”. |
| total_rooms | Discrete. The total number of rooms in the hotel. Must be a positive integer between 1 and 400. Missing values should be replaced with the default number of rooms, 100. |
| staff_count | Discrete. The number of staff employeed in the hotel service department. Missing values should be replaced with the total_rooms multiplied by 1.5. |
| opening_date | Discrete. The year in which the hotel opened. This can be any value between 2000 and 2023. Missing values should be replaced with 2023. |
| target_guests | Nominal. The primary type of guest that is expected to use the hotel. Can be one of 'Leisure' or 'Business'. Missing values should be replaced with 'Leisure'. |
-- This query selects specific columns from the 'branch' table and performs data cleaning and transformation operations on them.
SELECT
id, -- Selecting the unique identifier for the branch
-- Cleaning and standardizing the 'location' data by converting to uppercase, removing extra spaces, and replacing NULL values with 'Unknown'
UPPER(TRIM(COALESCE(location, 'Unknown'))) AS location,
-- Handling outliers in 'total_rooms' by setting values less than 1 or greater than 400 to 100
CASE
WHEN total_rooms IS NULL OR total_rooms < 1 OR total_rooms > 400 THEN 100
ELSE total_rooms
END AS total_rooms,
-- Handling missing values in 'staff_count' by setting it to 1.5 times the 'total_rooms' value if it's NULL
COALESCE(staff_count, total_rooms * 1.5) AS staff_count,
-- Cleaning 'opening_date' by setting NULL values, non-4-digit values, or values outside the range 2000-2023 to '2023'
CASE
WHEN opening_date IS NULL OR LENGTH(opening_date) != 4 OR CAST(opening_date AS INTEGER) < 2000 OR CAST(opening_date AS INTEGER) > 2023 THEN '2023'
ELSE opening_date
END AS opening_date,
-- Cleaning the 'target_guests' column by mapping variations to 'Business' or 'Leisure'
CASE
WHEN UPPER(TRIM(target_guests)) IN ('B.', 'B', 'BUSINESS', 'BUSNIESS') THEN 'Business'
WHEN UPPER(TRIM(target_guests)) = 'LEISURE' THEN 'Leisure'
ELSE 'Leisure' -- Default case if none of the above conditions are met
END AS target_guests
FROM
branch; -- Selecting data from the 'branch' tableTask 2
The Head of Operations wants to know whether there is a difference in time taken to respond to a customer request in each hotel. They already know that different services take different lengths of time.
Calculate the average and maximum duration for each branch and service. Your output should include the columns service_id, branch_id, avg_time_taken and max_time_taken. Values should be rounded to two decimal places where appropriate.
-- This query calculates average and maximum time taken for each service at each branch from the 'Request' table.
SELECT
service_id, -- Selecting the service identifier
branch_id, -- Selecting the branch identifier
-- Calculating the rounded average time taken for each service at each branch to two decimal places
ROUND(AVG(time_taken), 2) AS avg_time_taken,
-- Finding the maximum time taken for each service at each branch
MAX(time_taken) AS max_time_taken
FROM
Request -- Selecting data from the 'Request' table
GROUP BY
service_id, branch_id; -- Grouping the results by service_id and branch_id to calculate averages and maximums for each service at each branchTask 3
The management team want to target improvements in Meal and Laundry service in Europe (EMEA) and Latin America (LATAM).
Write a query to return the description of the service, the id and location of the branch, the id of the request as request_id and the rating for the services and locations of interest to the management team.
Use the original branch table, not the output of task 1.
-- This query retrieves specific information about services, branches, requests, and ratings based on certain filters.
SELECT
S.description, -- Selects service description from the Service table.
B.id AS branch_id, -- Selects branch id from the Branch table.
B.location, -- Selects branch location from the Branch table.
R.id AS request_id, -- Selects request id from the Request table.
R.rating -- Selects service rating from the Request table.
FROM
Service S -- Selecting data from the Service table
JOIN
Request R ON S.id = R.service_id -- Joining Service and Request tables by service id
JOIN
Branch B ON R.branch_id = B.id -- Joining Request and Branch tables by branch id
WHERE
B.location IN ('EMEA', 'LATAM') -- Filtering for branches in EMEA and LATAM regions
AND S.description IN ('Meal', 'Laundry'); -- Filtering for Meal and Laundry servicesTask 4
So that you can take a more detailed look at the lowest performing hotels, you want to get service and branch information where the average rating for the branch and service combination is lower than 4.5 - the target set by management.
Your query should return the service_id and branch_id, and the average rating (avg_rating), rounded to 2 decimal places.
-- This query calculates the average rating for each service at each branch from the 'Request' table and filters the results.
SELECT
service_id, -- Selecting the service identifier
branch_id, -- Selecting the branch identifier
-- Calculating the rounded average rating for each service at each branch to two decimal places
ROUND(AVG(rating), 2) AS avg_rating
FROM
Request -- Selecting data from the 'Request' table
GROUP BY
service_id, branch_id -- Grouping the results by service_id and branch_id to calculate average ratings for each service at each branch
HAVING
AVG(rating) < 4.5; -- Filtering the results to include only those with an average rating less than 4.5