Skip to content

Practical Exam: Hotel Operations

LuxurStay Hotels is a major, international chain of hotels. They offer hotels for both business and leisure travellers in major cities across the world. The chain prides themselves on the level of customer service that they offer.

However, the management has been receiving complaints about slow room service in some hotel branches. As these complaints are impacting the customer satisfaction rates, it has become a serious issue. Recent data shows that customer satisfaction has dropped from the 4.5 rating that they expect.

You are working with the Head of Operations to identify possible causes and hotel branches with the worst problems.

Data

The following schema diagram shows the tables available. You have only been provided with data where customers provided a feedback rating.

Task 1

Before you can start any analysis, you need to confirm that the data is accurate and reflects what you expect to see.

It is known that there are some issues with the branch table, and the data team have provided the following data description.

Write a query to return data matching this description. You must match all column names and description criteria.

Column NameCriteria
idNominal. The unique identifier of the hotel.
Missing values are not possible due to the database structure.
locationNominal. The location of the particular hotel. One of four possible values, 'EMEA', 'NA', 'LATAM' and 'APAC'.
Missing values should be replaced with “Unknown”.
total_roomsDiscrete. The total number of rooms in the hotel. Must be a positive integer between 1 and 400.
Missing values should be replaced with the default number of rooms, 100.
staff_countDiscrete. The number of staff employeed in the hotel service department.
Missing values should be replaced with the total_rooms multiplied by 1.5.
opening_dateDiscrete. The year in which the hotel opened. This can be any value between 2000 and 2023.
Missing values should be replaced with 2023.
target_guestsNominal. The primary type of guest that is expected to use the hotel. Can be one of 'Leisure' or 'Business'.
Missing values should be replaced with 'Leisure'.
Spinner
DataFrameas
clean_branch_data
variable

-- This query selects specific columns from the 'branch' table and performs data cleaning and transformation operations on them.

SELECT
  id, -- Selecting the unique identifier for the branch

  -- Cleaning and standardizing the 'location' data by converting to uppercase, removing extra spaces, and replacing NULL values with 'Unknown'
  UPPER(TRIM(COALESCE(location, 'Unknown'))) AS location,

  -- Handling outliers in 'total_rooms' by setting values less than 1 or greater than 400 to 100
  CASE 
    WHEN total_rooms IS NULL OR total_rooms < 1 OR total_rooms > 400 THEN 100 
    ELSE total_rooms 
  END AS total_rooms,

  -- Handling missing values in 'staff_count' by setting it to 1.5 times the 'total_rooms' value if it's NULL
  COALESCE(staff_count, total_rooms * 1.5) AS staff_count,

  -- Cleaning 'opening_date' by setting NULL values, non-4-digit values, or values outside the range 2000-2023 to '2023'
  CASE 
    WHEN opening_date IS NULL OR LENGTH(opening_date) != 4 OR CAST(opening_date AS INTEGER) < 2000 OR CAST(opening_date AS INTEGER) > 2023 THEN '2023' 
    ELSE opening_date 
  END AS opening_date,

  -- Cleaning the 'target_guests' column by mapping variations to 'Business' or 'Leisure'
  CASE
    WHEN UPPER(TRIM(target_guests)) IN ('B.', 'B', 'BUSINESS', 'BUSNIESS') THEN 'Business' 
    WHEN UPPER(TRIM(target_guests)) = 'LEISURE' THEN 'Leisure' 
    ELSE 'Leisure' -- Default case if none of the above conditions are met
  END AS target_guests

FROM
  branch; -- Selecting data from the 'branch' table

Task 2

The Head of Operations wants to know whether there is a difference in time taken to respond to a customer request in each hotel. They already know that different services take different lengths of time.

Calculate the average and maximum duration for each branch and service. Your output should include the columns service_id, branch_id, avg_time_taken and max_time_taken. Values should be rounded to two decimal places where appropriate.

Spinner
DataFrameas
average_time_service
variable

-- This query calculates average and maximum time taken for each service at each branch from the 'Request' table.

SELECT
  service_id, -- Selecting the service identifier
  branch_id, -- Selecting the branch identifier

  -- Calculating the rounded average time taken for each service at each branch to two decimal places
  ROUND(AVG(time_taken), 2) AS avg_time_taken,

  -- Finding the maximum time taken for each service at each branch
  MAX(time_taken) AS max_time_taken

FROM
  Request -- Selecting data from the 'Request' table

GROUP BY
  service_id, branch_id; -- Grouping the results by service_id and branch_id to calculate averages and maximums for each service at each branch

Task 3

The management team want to target improvements in Meal and Laundry service in Europe (EMEA) and Latin America (LATAM).

Write a query to return the description of the service, the id and location of the branch, the id of the request as request_id and the rating for the services and locations of interest to the management team.

Use the original branch table, not the output of task 1.

Spinner
DataFrameas
target_hotels
variable

-- This query retrieves specific information about services, branches, requests, and ratings based on certain filters.

SELECT
  S.description,   -- Selects service description from the Service table.
  B.id AS branch_id,   -- Selects branch id from the Branch table.
  B.location,   -- Selects branch location from the Branch table.
  R.id AS request_id,   -- Selects request id from the Request table.
  R.rating   -- Selects service rating from the Request table.

FROM
  Service S   -- Selecting data from the Service table
JOIN
  Request R ON S.id = R.service_id   -- Joining Service and Request tables by service id
JOIN
  Branch B ON R.branch_id = B.id   -- Joining Request and Branch tables by branch id

WHERE
  B.location IN ('EMEA', 'LATAM')   -- Filtering for branches in EMEA and LATAM regions
  AND S.description IN ('Meal', 'Laundry');   -- Filtering for Meal and Laundry services

Task 4

So that you can take a more detailed look at the lowest performing hotels, you want to get service and branch information where the average rating for the branch and service combination is lower than 4.5 - the target set by management.

Your query should return the service_id and branch_id, and the average rating (avg_rating), rounded to 2 decimal places.

Spinner
DataFrameas
average_rating
variable

-- This query calculates the average rating for each service at each branch from the 'Request' table and filters the results.

SELECT
  service_id, -- Selecting the service identifier
  branch_id, -- Selecting the branch identifier

  -- Calculating the rounded average rating for each service at each branch to two decimal places
  ROUND(AVG(rating), 2) AS avg_rating

FROM
  Request -- Selecting data from the 'Request' table

GROUP BY
  service_id, branch_id -- Grouping the results by service_id and branch_id to calculate average ratings for each service at each branch

HAVING
  AVG(rating) < 4.5; -- Filtering the results to include only those with an average rating less than 4.5