Project Objective
Maintaining records of requests for conducting research and their processing is crucial for several reasons:
-
Traceability: Thorough tracking of all research requests and their processing ensures a clear and auditable history of each sample's journey through the laboratory.
-
Communication: Records serve as a communication tool, informing all laboratory staff about the status of research requests and their processing, fostering collaboration and increasing efficiency.
-
Laboratory Efficiency Assessment: The collected data provides valuable statistical information about laboratory services, facilitating the identification of trends, strengths, and areas needing improvement.
Task: Processing the query table to obtain convenient statistics and analyze the quality of services provided by the laboratory.
Query Table Description
The Query Table functions as a comprehensive repository of all research requests processed in the laboratory, encompassing the following columns:
- Initiation: Date when the research request was initiated.
- Customer: Name or identifier of the customer submitting the research request.
- Project: Project or study associated with the research request.
- ID: Unique identifier assigned to the sample.
- Parameter: Measurable parameter of the sample being analyzed.
- Method: Methodology used for the analysis of the sample.
- Reporting: Date when the analysis results were reported.
- Note: Additional notes or comments related to the research request or sample analysis.
Preliminary Processing of Query Table in R
## Loading Required Libraries
if (!require("tidyverse")) install.packages("tidyverse")
if (!require("bizdays")) install.packages("bizdays")
conflicted::conflict_prefer("filter", "dplyr")
conflicted::conflict_prefer("lag", "dplyr")
library("tidyverse")
library("bizdays")
library("readxl")
library("DT")
## Loading Query Table
excel_file <- read_excel('Query_Table.xlsx')
## Data Cleaning
report <-
excel_file %>%
filter(!str_detect(Note,"Canceled") | is.na(Note)) %>%
transmute(Initiation = as.Date(Initiation), Customer, Project,
Service = paste0(Parameter,"(", Method,")"),
Reporting = as.Date(Reporting))
glimpse(report)## Creating a Calendar for the Required Range
russian_holidays <- as.Date(c("2022-05-02", "2022-05-03", "2022-05-09", "2022-05-10", "2022-06-13",
"2022-11-04", "2023-01-02", "2023-01-03", "2023-01-04", "2023-01-05",
"2023-01-06", "2023-02-23", "2023-02-24", "2023-03-08", "2023-05-01",
"2023-05-08", "2023-05-09", "2023-06-12", "2023-11-06"
))
cal <- create.calendar("RUS",
holidays = russian_holidays,
weekdays = c("saturday", "sunday"),
start.date = "2022-04-01",
end.date = "2023-12-31"
)
## Turnaround Time (TAT)
report$TAT <- bizdays(report$Initiation, report$Reporting, cal)
report$TAT[report$TAT < 0 & !is.na(report$TAT)] <- 0
glimpse(report)Data Analysis Tasks:
-
List of Services for a Specified Period: Obtain a comprehensive list of services requested during a specific period, along with the percentage of fulfillment. This will help us understand the volume of services required by clients over a fixed period.
-
Cumulative Flow Diagram of Requests and Results Delivery: By visualizing the cumulative flow of research requests over time, we can identify bottlenecks or fluctuations in demand. This diagram provides insights into workload distribution and aids in capacity planning.
-
Statistics on Results Delivery Time: Calculate various statistical metrics for the time taken to deliver results for different services. This analysis will give us an overview of the efficiency of our process and highlight areas for improvement.
-
Distribution Plot of Results Delivery Time: A histogram helps visualize the distribution of result delivery times for different services. This will help us identify any deviations or patterns in delivery time and assess the overall effectiveness of service delivery.
## List of services for a specified period
report_filter <- report %>%
filter(Customer %in% c("Customer-01", "Customer-04", "Customer-07", "Customer-10")) %>%
filter(Service %in% c("Parameter-01(Method-01)", "Parameter-07(Method-04)",
"Parameter-20(Method-10)", "Parameter-38(Method-16)"))
datatable(report_filter %>%
filter( as.Date("2023-09-01") <= Initiation , Initiation <= as.Date("2023-11-30") ) %>%
group_by(Customer, Service) %>%
summarise(Ordered = n(),
Done = sum(!is.na(Reporting)), .groups = "drop") %>%
transmute(Customer, Service,
"Ordered Services" = Ordered,
"Provided Services" = paste0(round(Done/Ordered*100,2),"%")
)
)## Cumulative flow diagram of research requests
CFD_IN <- report_filter %>%
group_by(Initiation) %>%
summarise(total = n()) %>%
mutate(total = cumsum(total)) %>%
ungroup
CFD_OUT <- report_filter %>%
group_by(Reporting) %>%
summarise(total = n()) %>%
mutate(total = cumsum(total)) %>%
ungroup %>%
filter(!is.na(Reporting))
CFD_NOW_IN <- CFD_IN %>%
bind_rows(CFD_IN %>%
slice_tail(n = 1) %>%
mutate(Initiation = as.Date("2024-01-15"))
)
CFD_NOW_OUT <- CFD_OUT %>%
bind_rows(CFD_OUT %>%
slice_tail(n = 1) %>%
mutate(Reporting = as.Date("2024-01-15"))
)
ggplot() +
geom_area(data = CFD_NOW_IN, aes(x = Initiation, y = total, fill = "Ordered Services"),
alpha = 0.5, color = "darkred", linewidth = 1) +
geom_area(data = CFD_NOW_OUT, aes(x = Reporting, y = total, fill = "Provided Services"),
alpha = 0.5, color = "darkgreen", linewidth = 1) +
scale_fill_manual(name = NULL, values = c("Ordered Services" = "lightcoral",
"Provided Services" = "lightgreen")) +
labs(
x = "Date",
y = "Lab Services") +
scale_x_date(breaks = "3 month", date_labels = "%b %Y") +
geom_vline(xintercept = as.Date("2023-09-01"), linetype = "dashed", color = "dodgerblue", linewidth = 1) +
geom_vline(xintercept = as.Date("2023-11-30"), linetype = "dashed", color = "dodgerblue", linewidth = 1) +
theme_minimal()+
theme(
axis.text = element_text(size = 10, face = "bold"),
axis.title = element_text(size = 15),
legend.text = element_text(size = 10, face = "bold"),
axis.text.x = element_text(angle = 45, hjust = 1))## Statistics on the time of result delivery
datatable(
report_filter %>%
summarise(
"Ordered Services" = round(sum(!is.na(Initiation)), 0),
"Provided Services" = round(sum(!is.na(Reporting)), 0),
"Mean" = round(mean(TAT, na.rm = TRUE),2),
"StDev" = round(sd(TAT, na.rm = TRUE), 2),
"Minimum" = min(TAT, na.rm = TRUE),
"First Quartile (Q1)" = quantile(TAT, 0.25, na.rm = TRUE),
"Mediana" = median(TAT, na.rm = TRUE),
"Third Quartile (Q3)" = quantile(TAT, 0.75, na.rm = TRUE),
"Maximum" = max(TAT, na.rm = TRUE)
) %>%
pivot_longer(everything(), names_to = "names", values_to = "values"),
colnames = NULL
)## Distribution plot of result delivery time
if (sum(!is.na(report_filter$TAT)) == 0) {
ggplot() +
theme_void() +
annotate("text", x = 0.5, y = 0.5,
label = "There were no results during this period",
size = 10, color = "lightcoral")
}
ggplot(report_filter %>% filter(!is.na(TAT)), aes(x = TAT)) +
geom_histogram(fill = "dodgerblue", color = "darkblue", binwidth = 3) +
scale_x_continuous(breaks = seq(0, max(report_filter$TAT, na.rm = TRUE), by = 3)) +
labs(
x = "Number of working days",
y = "Lab Services") +
theme_minimal() +
theme(
axis.text = element_text(size = 10, face = "bold"),
axis.title = element_text(size = 15)) Interactive Approach
All the aforementioned data analysis tasks were demonstrated using specific Query Table filtering settings. For enhanced usability, an interactive Shiny application was developed.
In the Shiny application, users will be able to:
- Explore the data analysis results for the query table using various filtering options.
- Analyze their own table with identical columns.