Skip to content

Welcome! You are now in DataLab.

You successfully completed your project and are looking for some additional related challenges. This DataLab workbook contains the official solution from our curriculum staff, along with Additional Challenges at the bottom. If you would like a quick overview of DataLab, please refer to the help menu. You can easily share your project with your friends and colleagues when you're done.

Good luck with your additional challenges!

Manufacturing processes for any product is like putting together a puzzle. Products are pieced together step by step, and keeping a close eye on the process is important.

For this project, you're supporting a team that wants to improve how they monitor and control a manufacturing process. The goal is to implement a more methodical approach known as statistical process control (SPC). SPC is an established strategy that uses data to determine whether the process works well. Processes are only adjusted if measurements fall outside of an acceptable range.

This acceptable range is defined by an upper control limit (UCL) and a lower control limit (LCL), the formulas for which are:

The UCL defines the highest acceptable height for the parts, while the LCL defines the lowest acceptable height for the parts. Ideally, parts should fall between the two limits.

Using SQL window functions and nested queries, you'll analyze historical manufacturing data to define this acceptable range and identify any points in the process that fall outside of the range and therefore require adjustments. This will ensure a smooth running manufacturing process consistently making high-quality products.

The data

The data is available in the manufacturing_parts table which has the following fields:

  • item_no: the item number
  • length: the length of the item made
  • width: the width of the item made
  • height: the height of the item made
  • operator: the operating machine
Spinner
DataFrameas
alerts
variable
-- Write your query here
-- Flag whether the height of a product is within the control limits
-- Write your query here

WITH CTE1 AS 
	(SELECT operator,
	 	height,
		AVG(height) OVER(PARTITION BY operator ORDER BY item_no ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS avg_height,
		STDDEV(height) OVER(PARTITION BY operator ORDER BY item_no ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS stddev_height,
		ROW_NUMBER() OVER(PARTITION BY operator ORDER BY item_no) AS row_number
	FROM manufacturing_parts),
CTE2 AS 
	(SELECT *,
		   avg_height + 3 * (stddev_height / SQRT(5)) AS ucl,
		   avg_height - 3 * (stddev_height / SQRT(5)) AS lcl
	FROM CTE1
	WHERE row_number >= 5)

SELECT *, 
	CASE WHEN height NOT BETWEEN lcl AND ucl THEN TRUE
		ELSE FALSE
	END AS alert
FROM CTE2

Extended Project below

After identifying individual out-of-control products, the team suspects that certain operators may need further training. They want to pinpoint operators whose machines consistently produce parts outside control limits.

Using common table expressions and aggregations you will identify operators whose machines have a higher-than-average number of alerts compared to the total alerts for all operators.

Spinner
DataFrameas
df
variable
WITH CTE1 AS (
    SELECT operator,
           height,
           AVG(height) OVER(PARTITION BY operator ORDER BY item_no ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS avg_height,
           STDDEV(height) OVER(PARTITION BY operator ORDER BY item_no ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS stddev_height,
           ROW_NUMBER() OVER(PARTITION BY operator ORDER BY item_no) AS row_number
    FROM manufacturing_parts
),
CTE2 AS (
    SELECT *,
           avg_height + 3 * (stddev_height / SQRT(5)) AS ucl,
           avg_height - 3 * (stddev_height / SQRT(5)) AS lcl
    FROM CTE1
    WHERE row_number >= 5
),
alerts AS (
    SELECT *, 
           CASE WHEN height NOT BETWEEN lcl AND ucl THEN 1
                ELSE 0
           END AS alert
    FROM CTE2
),
operator_alerts AS (
	SELECT operator, 
		SUM(alert)  AS total_alerts
	FROM alerts
	GROUP BY operator
),
overall_avg_alerts AS(
	SELECT AVG(total_alerts) AS avg_alerts
	FROM operator_alerts
)
SELECT o.operator,
	o.total_alerts,
	oa.avg_alerts
FROM operator_alerts AS o
CROSS JOIN overall_avg_alerts AS oa
WHERE o.total_alerts > oa.avg_alerts
;