Identify issues in a manufacturing process using SQL window functions and subqueries.
Manufacturing processes vary, and ensuring consistent quality requires careful monitoring and adjustment. Sometimes, the adjustments can be too extreme and may have the opposite effect, reducing quality. Apply SQL window functions, partitioning and summary statistics to manufacturing data to identify if any adjustments were wrong for the company.
Manufacturing processes for any product is like putting together a puzzle. Products are pieced together step by step, and keeping a close eye on the process is important.
Analyze the manufacturing_parts table and determine whether the manufacturing process is performing within acceptable control limits.
For this project, you're supporting a team that wants to improve how they monitor and control a manufacturing process. The goal is to implement a more methodical approach known as statistical process control (SPC). SPC is an established strategy that uses data to determine whether the process works well. Processes are only adjusted if measurements fall outside of an acceptable range.
This acceptable range is defined by an upper control limit (UCL) and a lower control limit (LCL), the formulas for which are:
The UCL defines the highest acceptable height for the parts, while the LCL defines the lowest acceptable height for the parts. Ideally, parts should fall between the two limits.
Using SQL window functions and nested queries, you'll analyze historical manufacturing data to define this acceptable range and identify any points in the process that fall outside of the range and therefore require adjustments. This will ensure a smooth running manufacturing process consistently making high-quality products.
The data
The data is available in the manufacturing_parts
table which has the following fields:
item_no
: the item numberlength
: the length of the item madewidth
: the width of the item madeheight
: the height of the item madeoperator
: the operating machine
Create an alert that flags whether the height of a product is within the control limits for each operator using the formulas provided in the workbook.
- The final query should return the following fields: operator, row_number, height, avg_height, stddev_height, ucl, lcl, alert, and be ordered by the item_no.
- The alert column will be your boolean flag.
- Use a window function of length 5 to calculate the control limits, considering rows up to and including the current row; incomplete windows should be excluded from the final query output.
SELECT * FROM information_schema.columns
WHERE table_name = 'manufacturing_parts'
SELECT COUNT(*) FROM manufacturing_parts
SELECT * FROM manufacturing_parts
SELECT AVG(height) AS avg_height
FROM manufacturing_parts
SELECT STDDEV(height) AS stddev_height
FROM manufacturing_parts;
1. Calculate moving average and moving standard deviation
You'll need to create a subquery that is partitioned by operator, ordered by item_no and calculates the average and standard deviation of height over a window that includes rows starting 4 rows before the current row up to the current row (One way to create a window of length 5 is to use the rows 4 PRECEDING from the current row and the CURRENT ROW); alias the fields as avg_height and stddev_height.
SELECT
operator,
ROW_NUMBER() OVER (PARTITION BY operator
ORDER BY item_no
ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS row_number,
height,
AVG(height) OVER (PARTITION BY operator
ORDER BY item_no
ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS avg_height,
STDDEV(height) OVER (PARTITION BY operator
ORDER BY item_no
ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS stddev_height
FROM manufacturing_parts
-- simplified query from above
SELECT
operator,
ROW_NUMBER() OVER w AS row_number,
height,
AVG(height) OVER w AS avg_height,
STDDEV(height) OVER w AS stddev_height
FROM manufacturing_parts
WINDOW w AS (
PARTITION BY operator
ORDER BY item_no
ROWS BETWEEN 4 PRECEDING AND CURRENT ROW)
Remember the criteria:
This acceptable range is defined by an upper control limit (UCL) and a lower control limit (LCL), the formulas for which are:
The UCL defines the highest acceptable height for the parts, while the LCL defines the lowest acceptable height for the parts. Ideally, parts should fall between the two limits.
2. Calculate upper and lower control limits
Using the previous query as a subquery, create two new columns by using the average and standard deviation columns from the subquery and applying them to the formula shown to calculate the upper and lower limits, aliased as ucl and lcl respectively.
SELECT
a.*,
a.avg_height + 3*a.stddev_height/SQRT(5) AS ucl,
a.avg_height - 3*a.stddev_height/SQRT(5) AS lcl
FROM (
SELECT
operator,
ROW_NUMBER() OVER w AS row_number,
height,
AVG(height) OVER w AS avg_height,
STDDEV(height) OVER w AS stddev_height
FROM manufacturing_parts
WINDOW w AS (
PARTITION BY operator
ORDER BY item_no
ROWS BETWEEN 4 PRECEDING AND CURRENT ROW)
) AS a
WHERE a.row_number >= 5
3. Creating an alert to evaluate the manufacturing process
Use your previous work as a subquery, you need to add a new boolean (true/false) alert field that has checked each value of height against theucl and lcl fields you have just created. You'll then need to add a filter to remove incomplete window rows.
One way to compare height to ucl and lcl might be to use a CASE statement that checks whether a value is or is not BETWEEN two other values and flagging them as either TRUE or FALSE.