Skip to content
(SQL) Project: Evaluate a Manufacturing Process
  • AI Chat
  • Code
  • Report
  • Identify issues in a manufacturing process using SQL window functions and subqueries.

    Manufacturing processes vary, and ensuring consistent quality requires careful monitoring and adjustment. Sometimes, the adjustments can be too extreme and may have the opposite effect, reducing quality. Apply SQL window functions, partitioning and summary statistics to manufacturing data to identify if any adjustments were wrong for the company.

    Manufacturing processes for any product is like putting together a puzzle. Products are pieced together step by step, and keeping a close eye on the process is important.

    Analyze the manufacturing_parts table and determine whether the manufacturing process is performing within acceptable control limits.

    For this project, you're supporting a team that wants to improve how they monitor and control a manufacturing process. The goal is to implement a more methodical approach known as statistical process control (SPC). SPC is an established strategy that uses data to determine whether the process works well. Processes are only adjusted if measurements fall outside of an acceptable range.

    This acceptable range is defined by an upper control limit (UCL) and a lower control limit (LCL), the formulas for which are:

    The UCL defines the highest acceptable height for the parts, while the LCL defines the lowest acceptable height for the parts. Ideally, parts should fall between the two limits.

    Using SQL window functions and nested queries, you'll analyze historical manufacturing data to define this acceptable range and identify any points in the process that fall outside of the range and therefore require adjustments. This will ensure a smooth running manufacturing process consistently making high-quality products.

    The data

    The data is available in the manufacturing_parts table which has the following fields:

    • item_no: the item number
    • length: the length of the item made
    • width: the width of the item made
    • height: the height of the item made
    • operator: the operating machine

    Create an alert that flags whether the height of a product is within the control limits for each operator using the formulas provided in the workbook.

    1. The final query should return the following fields: operator, row_number, height, avg_height, stddev_height, ucl, lcl, alert, and be ordered by the item_no.
    2. The alert column will be your boolean flag.
    3. Use a window function of length 5 to calculate the control limits, considering rows up to and including the current row; incomplete windows should be excluded from the final query output.
    Spinner
    DataFrameavailable as
    df1
    variable
    SELECT * FROM information_schema.columns
    WHERE table_name = 'manufacturing_parts'
    Spinner
    DataFrameavailable as
    df2
    variable
    SELECT COUNT(*) FROM manufacturing_parts
    Spinner
    DataFrameavailable as
    df
    variable
    SELECT * FROM manufacturing_parts
    Spinner
    DataFrameavailable as
    df3
    variable
    SELECT AVG(height) AS avg_height
    FROM manufacturing_parts
    Spinner
    DataFrameavailable as
    df4
    variable
    SELECT STDDEV(height) AS stddev_height
    FROM manufacturing_parts;

    1. Calculate moving average and moving standard deviation

    You'll need to create a subquery that is partitioned by operator, ordered by item_no and calculates the average and standard deviation of height over a window that includes rows starting 4 rows before the current row up to the current row (One way to create a window of length 5 is to use the rows 4 PRECEDING from the current row and the CURRENT ROW); alias the fields as avg_height and stddev_height.

    Spinner
    DataFrameavailable as
    df6
    variable
    SELECT 
    	operator,
    	ROW_NUMBER() OVER (PARTITION BY operator 
    						ORDER BY item_no 
    						ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS row_number, 
    	height, 
    	AVG(height) OVER (PARTITION BY operator 
    					  ORDER BY item_no 
    					  ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS avg_height, 
    	STDDEV(height) OVER (PARTITION BY operator 
    						 ORDER BY item_no 
    						 ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS stddev_height
    FROM manufacturing_parts
    Spinner
    DataFrameavailable as
    df5
    variable
    -- simplified query from above
    SELECT 
    	operator,
    	ROW_NUMBER() OVER w AS row_number, 
    	height, 
    	AVG(height) OVER w AS avg_height, 
    	STDDEV(height) OVER w AS stddev_height
    FROM manufacturing_parts 
    WINDOW w AS (
    	PARTITION BY operator 
    	ORDER BY item_no 
    	ROWS BETWEEN 4 PRECEDING AND CURRENT ROW)

    Remember the criteria:

    This acceptable range is defined by an upper control limit (UCL) and a lower control limit (LCL), the formulas for which are:

    The UCL defines the highest acceptable height for the parts, while the LCL defines the lowest acceptable height for the parts. Ideally, parts should fall between the two limits.

    2. Calculate upper and lower control limits

    Using the previous query as a subquery, create two new columns by using the average and standard deviation columns from the subquery and applying them to the formula shown to calculate the upper and lower limits, aliased as ucl and lcl respectively.

    Spinner
    DataFrameavailable as
    df7
    variable
    SELECT
    	a.*, 
    	a.avg_height + 3*a.stddev_height/SQRT(5) AS ucl, 
    	a.avg_height - 3*a.stddev_height/SQRT(5) AS lcl  
    FROM (
    	SELECT 
    		operator,
    		ROW_NUMBER() OVER w AS row_number, 
    		height, 
    		AVG(height) OVER w AS avg_height, 
    		STDDEV(height) OVER w AS stddev_height
    	FROM manufacturing_parts 
    	WINDOW w AS (
    		PARTITION BY operator 
    		ORDER BY item_no 
    		ROWS BETWEEN 4 PRECEDING AND CURRENT ROW)
    	) AS a
    WHERE a.row_number >= 5

    3. Creating an alert to evaluate the manufacturing process

    Use your previous work as a subquery, you need to add a new boolean (true/false) alert field that has checked each value of height against theucl and lcl fields you have just created. You'll then need to add a filter to remove incomplete window rows.

    One way to compare height to ucl and lcl might be to use a CASE statement that checks whether a value is or is not BETWEEN two other values and flagging them as either TRUE or FALSE.