Skip to content
Project: Evaluate a Manufacturing Process
  • AI Chat
  • Code
  • Report
  • Manufacturing processes for any product is like putting together a puzzle. Products are pieced together step by step, and keeping a close eye on the process is important.

    For this project, you're supporting a team that wants to improve how they monitor and control a manufacturing process. The goal is to implement a more methodical approach known as statistical process control (SPC). SPC is an established strategy that uses data to determine whether the process works well. Processes are only adjusted if measurements fall outside of an acceptable range.

    This acceptable range is defined by an upper control limit (UCL) and a lower control limit (LCL), the formulas for which are:

    The UCL defines the highest acceptable height for the parts, while the LCL defines the lowest acceptable height for the parts. Ideally, parts should fall between the two limits.

    Using SQL window functions and nested queries, you'll analyze historical manufacturing data to define this acceptable range and identify any points in the process that fall outside of the range and therefore require adjustments. This will ensure a smooth running manufacturing process consistently making high-quality products.

    The data

    The data is available in the manufacturing_parts table which has the following fields:

    • item_no: the item number
    • length: the length of the item made
    • width: the width of the item made
    • height: the height of the item made
    • operator: the operating machine
    Spinner
    DataFrameavailable as
    alerts
    variable
    /* 
    A human-friendly approach to building the query, making it easy to read, comprehend, maintain, and debug.
    This approach is not the most efficient way to accomplish the task.
    Applying this method on large datasets, especially in live or interactive applications, 
    could pose significant performance issues, as quick response times and a smooth user experience are priorities.
    Nonetheless, it serves as a good practice for handling Window functions and CTEs.
    
    First step is building a CTE with all metrics(using aggregate and window functions) computed inside
    The main reason for using a CTE is that we cannot filter directly on aggregate and window functions, using the classic
    filter clauses like 'WHERE' and 'HAVING'
    
    WITH metrics AS (
        SELECT 
            operator, 
            row_number() OVER (PARTITION BY operator ORDER BY item_no) AS row_number, 
            height, 
            AVG(height) OVER (PARTITION BY operator ORDER BY item_no ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS avg_height,
            STDDEV(height) OVER (PARTITION BY operator ORDER BY item_no ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS stddev_height,
            AVG(height) OVER (PARTITION BY operator ORDER BY item_no ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) 
            + 3 * (STDDEV(height) OVER (PARTITION BY operator ORDER BY item_no ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) / SQRT(5)) AS ucl,
            AVG(height) OVER (PARTITION BY operator ORDER BY item_no ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) 
            - 3 * (STDDEV(height) OVER (PARTITION BY operator ORDER BY item_no ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) / SQRT(5)) AS lcl
        FROM 
            manufacturing_parts
    )
    
    In the second step, we run the query on the CTE with a simple filter using the 'WHERE' clause.
    
    SELECT 
        operator, 
        row_number, 
        height, 
        avg_height, 
        stddev_height, 
        ucl, 
        lcl,
        CASE         
            WHEN row_number >= 5 AND height BETWEEN lcl AND ucl THEN False
            WHEN row_number >= 5 THEN True
            ELSE NULL
        END AS alert
    FROM 
        metrics
    WHERE 
        row_number >= 5
    
    
    
    A more efficient and optimized approach where all calculations are performed only once on their corresponding CTE and do not get repeated as in the above code snippet. 
    The window function definition is also written once, rather than being repeated multiple times.
    Note that this approach still keeps the code easy to read, comprehend, maintain, and debug. 
    */
    
    -- First CTE for defining the window function along with columns of interest
    WITH metrics AS (
        SELECT 
            operator,
            ROW_NUMBER() OVER win AS row_number, 
            height, 
            AVG(height) OVER win AS avg_height, 
            STDDEV(height) OVER win AS stddev_height
        FROM manufacturing_parts 
        WINDOW win AS (
            PARTITION BY operator 
            ORDER BY item_no 
            ROWS BETWEEN 4 PRECEDING AND CURRENT ROW)
    )
    -- Second CTE to calculate the desired metrics(ucl, lcl), and apply the filtering
    , filtered_metrics AS (
        SELECT 
            operator, 
            row_number, 
            height, 
            avg_height, 
            stddev_height, 
            avg_height + 3 * stddev_height / SQRT(5) AS ucl, 
            avg_height - 3 * stddev_height / SQRT(5) AS lcl
        FROM metrics
        WHERE row_number >= 5
    )
    -- In the Final Query we calculate the new 'alert' column based on the ucl-lcl values
    SELECT 
        operator, 
        row_number, 
        height, 
        avg_height, 
        stddev_height, 
        ucl, 
        lcl,
        CASE
            WHEN height NOT BETWEEN lcl AND ucl THEN TRUE
            ELSE FALSE
        END AS alert
    FROM filtered_metrics;