Skip to content
Project: Evaluate a Manufacturing Process (Portfolio)
  • AI Chat
  • Code
  • Report
  • Spinner

    Manufacturing processes for any product is like putting together a puzzle. Products are pieced together step by step, and keeping a close eye on the process is important.

    For this project, you're supporting a team that wants to improve how they monitor and control a manufacturing process. The goal is to implement a more methodical approach known as statistical process control (SPC). SPC is an established strategy that uses data to determine whether the process works well. Processes are only adjusted if measurements fall outside of an acceptable range.

    This acceptable range is defined by an upper control limit (UCL) and a lower control limit (LCL), the formulas for which are:

    The UCL defines the highest acceptable height for the parts, while the LCL defines the lowest acceptable height for the parts. Ideally, parts should fall between the two limits.

    Using SQL window functions and nested queries, you'll analyze historical manufacturing data to define this acceptable range and identify any points in the process that fall outside of the range and therefore require adjustments. This will ensure a smooth running manufacturing process consistently making high-quality products.

    The data

    The data is available in the manufacturing_parts table which has the following fields:

    • item_no: the item number
    • length: the length of the item made
    • width: the width of the item made
    • height: the height of the item made
    • operator: the operating machine
    Unknown integration
    DataFrameavailable as
    df
    variable
    -- step-1 Calculating moving average and moving standard deviation
    
    SELECT 
    	operator, 
    	
    	AVG(height) OVER(
    		PARTITION BY operator
    					 ORDER BY item_no
    					 ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
    	) AS avg_height,
    	
    	STDDEV(height) OVER(
    		PARTITION BY operator
    					ORDER BY item_no
    					ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
    	) AS stddev_height
    
    FROM manufacturing_parts;
    
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
    Unknown integration
    DataFrameavailable as
    df2
    variable
    -- step-2 Calculating upper and lower control limits
    
    -- made CTE the previous query
    WITH avg AS(
    SELECT 
    	operator, 
    	
    	AVG(height) OVER(
    		PARTITION BY operator
    					 ORDER BY item_no
    					 ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
    	) AS avg_height,
    	
    	STDDEV(height) OVER(
    		PARTITION BY operator
    					ORDER BY item_no
    					ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
    	) AS stddev_height
    FROM manufacturing_parts
    )
    -- actual start step-2
    SELECT 
    	avg.*,
    	avg.avg_height + 3*avg.stddev_height/SQRT(5) AS ucl, 
    		avg.avg_height - 3*avg.stddev_height/SQRT(5) AS lcl 
    FROM avg;
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
    Unknown integration
    DataFrameavailable as
    df3
    variable
    -- step-3 Creating an alert to evaluate the manufacturing process
    
    -- copied and recalled step-2 with CTE-1
    WITH avg AS(
    SELECT 
    	operator, 
    	item_no, -- Added item_no to the SELECT list
    	height,
    	AVG(height) OVER(
    		PARTITION BY operator
    					 ORDER BY item_no
    					 ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
    	) AS avg_height,
    	
    	STDDEV(height) OVER(
    		PARTITION BY operator
    					ORDER BY item_no
    					ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
    	) AS stddev_height
    FROM manufacturing_parts
    ),
    
    -- made another CTE-2 to the previous query again
    calcu AS (
    	SELECT 
    	*,
    	ROW_NUMBER() OVER (PARTITION BY operator ORDER BY item_no) AS row_num, -- Corrected window function definition to set row_num
    	avg.avg_height + 3*avg.stddev_height/SQRT(5) AS ucl, 
    	avg.avg_height - 3*avg.stddev_height/SQRT(5) AS lcl 
    FROM avg
    )
    
    -- actual start of step-3, Final query
    
    SELECT 
    	* ,
    	CASE WHEN height NOT BETWEEN lcl AND ucl
    	THEN TRUE ELSE FALSE END AS alert
    FROM calcu -- calling CTE-2 is enough
    WHERE row_num >= 5;
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
    Unknown integration
    DataFrameavailable as
    alerts
    variable
    -- step-3 Creating an alert to evaluate the manufacturing process
    
    -- copied and recalled step-2 with CTE-1
    WITH avg AS(
    SELECT 
    	operator, 
    	item_no, -- Added item_no to the SELECT list
    	height,
    	AVG(height) OVER(
    		PARTITION BY operator
    					 ORDER BY item_no
    					 ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
    	) AS avg_height,
    	
    	STDDEV(height) OVER(
    		PARTITION BY operator
    					ORDER BY item_no
    					ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
    	) AS stddev_height
    FROM manufacturing_parts
    ),
    
    -- made another CTE-2 to the previous query again
    calcu AS (
    	SELECT 
    	*,
    	ROW_NUMBER() OVER (PARTITION BY operator ORDER BY item_no) AS row_number, -- Corrected window function definition to set row_number
    	avg.avg_height + 3*avg.stddev_height/SQRT(5) AS ucl, 
    	avg.avg_height - 3*avg.stddev_height/SQRT(5) AS lcl 
    FROM avg
    )
    
    -- actual start of step-3, Final query
    
    SELECT 
    	operator, 
    	row_number, 
    	height, 
    	avg_height, 
    	stddev_height, 
    	ucl, 
    	lcl,
    	CASE WHEN height NOT BETWEEN lcl AND ucl THEN TRUE ELSE FALSE END AS alert
    FROM calcu -- calling CTE-2 is enough
    WHERE row_number >= 5;
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.