Skip to content
GitHub Project: Evaluating a Manufacturing Process using SQL and Python
Evaluating a Manufacturing Process
1. π Project Overview
- Brief Description:
The goal of the project is to analyze historical manufacturing data to define an acceptable range and identify any points in the process that fall outside of this range and therefore require adjustments. This methodical approach, known as statistical process control (SPC), is designed to ensure that the manufacturing process consistently delivers high-quality parts. - Key Questions:
- To evaluate whether the manufacturing process (specifically the height of manufactured parts) is within statistical control limits.
- To identify operators who are producing a disproportionately high number of out of the range parts, implying that their machines might be malfunctioning or they may need retraining.
2. π οΈ Data Preparation
- Load and Preview Dataset:
DataFrameas
df
variable
-- Preview of the dataset
SELECT *
FROM manufacturing_parts
LIMIT 5;
-
Key Variables:
The data is available in themanufacturing_parts
table which has the following fields and data types:item_no
: the item number,BIGINT
length
: the length of the item made,DOUBLE PRECISION
width
: the width of the item made,DOUBLE PRECISION
height
: the height of the item made,DOUBLE PRECISION
operator
: the operating machine,TEXT
3. π Analysis & Visualizations
- Insight 1: Create an alert that flags whether the height of a part is within the control limits for each operator.
- SQL query calculates the upper control limit (ucl) and lower control limit (lcl) based on the average height and standard deviation of the last 5 heights for each operator.
- If the height is outside these limits, it sets an alert flag to TRUE; otherwise, it sets it to FALSE.
- The innermost subquery (a) calculates the row number, height, average height, and standard deviation of height for the last 5 rows for each operator.
- The middle subquery (b) calculates the control limits (ucl and lcl) using the average height and standard deviation.
- The outer query flags whether the height is outside the control limits.
- The query only considers rows where there are at least 5 preceding rows (row_number >= 5).
- SQL query calculates the upper control limit (ucl) and lower control limit (lcl) based on the average height and standard deviation of the last 5 heights for each operator.
DataFrameas
df_alerts
variable
-- Flag whether the height of a part is within the control limits
SELECT
b.*,
CASE
WHEN b.height NOT BETWEEN b.lcl AND b.ucl THEN TRUE
ELSE FALSE
END AS alert
FROM (
SELECT
a.*,
a.avg_height + 3 * a.stddev_height/SQRT(5) AS ucl,
a.avg_height - 3 * a.stddev_height/SQRT(5) AS lcl
FROM (
SELECT
operator,
ROW_NUMBER() OVER w AS row_number,
height,
AVG(height) OVER w AS avg_height,
STDDEV(height) OVER w AS stddev_height
FROM manufacturing_parts
WINDOW w AS (
PARTITION BY operator
ORDER BY item_no
ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
)
) AS a
WHERE a.row_number >= 5
) AS b;
Visualization 1 Setup:
- The query result is stored in the `df_alerts` data frame. This dataset includes: `operator`, `row_number` (tracking time/order per operator), `height`, `avg_height`, `stddev_height`, `ucl`, `lcl`, `alert` flag. - π Control Chart (aka Shewhart Chart) components: - X-axis: row_number - Y-axis: height - Lines: `UCL` (Upper Control Limit), `LCL` (Lower Control Limit), `Mean` (avg_height) - Points: Color-coded by `alert` (**red** for True, **blue** for False) - Option: Separate chart per operator using subplots or filters
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_theme(style = "darkgrid")
# Filter data for operator 'Op-1'
df_op1 = df_alerts[df_alerts['operator'] == 'Op-1']
# Create a figure and axis
plt.figure(figsize = (10, 6))
# Plot the height values
sns.lineplot(x = "row_number", y = "height", hue = "alert", palette = {True: "red", False: "blue"}, data = df_op1, marker = "o")
# Plot the UCL, LCL, and Mean lines
plt.plot(df_op1['row_number'], df_op1['ucl'], color = 'green', linestyle = '--', label = 'UCL')
plt.plot(df_op1['row_number'], df_op1['lcl'], color = 'orange', linestyle = '--', label = 'LCL')
plt.plot(df_op1['row_number'], df_op1['avg_height'], color = 'purple', linestyle = '-', label = 'Mean')
# Styling
plt.title('Control Chart for Op-1', fontsize = 12)
plt.xlabel('Row Number', fontsize = 12)
plt.ylabel('Height', fontsize = 12)
plt.legend()
plt.tight_layout()
plt.show()
- Visualization 1 Analysis:
- The plotted control chart is defferent form a classic control chart where
UCL,
LCL
andMean
are typically horizontal lines, representing control thresholds. - The plotte chart uses rolling UCL/LCL/mean values, which vary row by row.
- This is technically accurate from the SQL output (because it was based on a moving window).
- The plotted control chart is defferent form a classic control chart where
- Visualization 1 Alternative (Fixed Control Limits):
- To visualize process control stability over time, used fixed control limits calculated from the entire data sample.
- Computed overall
mean
height for operator 'Op-1', overallstddev
for height,UCL,
andLCL
. - Used fixed control limits for plotting; rolling values for calculation and flagging.
# Filter data for operator 'Op-1'
df_op1 = df_alerts[df_alerts['operator'] == 'Op-1']
# Calculate overall stats
mean_height = df_op1['height'].mean()
stddev_height = df_op1['height'].std()
ucl = mean_height + 3 * stddev_height
lcl = mean_height - 3 * stddev_height
# Create control chart
plt.figure(figsize = (10, 6))
# Plot height values over row_number, color-coded by alert
sns.lineplot(x = "row_number", y = "height", hue = "alert", palette = {True: "red", False: "blue"}, data = df_op1, marker = "o")
# Plot horizontal UCL, LCL, and mean lines
plt.axhline(y = ucl, color = 'green', linestyle = '--', label = 'UCL')
plt.axhline(y = lcl, color = 'orange', linestyle = '--', label = 'LCL')
plt.axhline(y = mean_height, color = 'purple', linestyle = '-', label = 'Mean')
# Styling
plt.title('Control Chart for Op-1', fontsize = 12)
plt.xlabel('Row Number', fontsize = 12)
plt.ylabel('Height', fontsize = 12)
plt.legend()
plt.tight_layout()
plt.show()
- Visualization 1 (Dynamic Control Limits):
- In the initial version of the control chart, I chose to use dynamic (rolling) control limits based on a moving window of five parts. This approach aligns with short-term statistical process control (SPC) practices where local variation is monitored closely in real time. The dynamic limits reflect the immediate operating conditions of the machine and operator, making them more sensitive to shifts in the process and potentially catching subtle trends that fixed thresholds may miss.
- While traditional control charts use fixed global limits for overall process stability analysis, the dynamic version is more useful during early monitoring, tuning, or when the process is expected to evolve - such as during new operator training or after maintenance. For completeness and comparison, Iβve also included a classic control chart with fixed limits calculated from the overall mean and standard deviation.
4.1 π¬ Insight 1 Reflections
-
Most Common Pattern:
- The most common pattern observed in the dynamic control chart is the frequent adjustment of control limits. This is due to the rolling window approach, which recalculates the limits based on the most recent five parts. This pattern indicates that the process is being closely monitored and is responsive to short-term variations.
-
Unexpected Pattern:
- An unexpected pattern is the occasional false alerts generated by the dynamic control limits. Since the limits are more sensitive to immediate changes, minor fluctuations that are not indicative of a real process shift can trigger alerts. This can lead to overreacting to normal process noise.
-
Real-World Meaning:
- In a real-world manufacturing environment, the use of dynamic control limits can be particularly beneficial during periods of change, such as new operator training or post-maintenance. It allows for immediate detection of deviations and quick corrective actions. However, it also requires operators to be aware of the potential for false alerts and to use their judgment in interpreting the data. The inclusion of a classic control chart with fixed limits provides a baseline for overall process stability, ensuring that long-term trends are not overlooked.
- Insight 2: Identify operators whose alert rate is above average β indicating systemic issues beyond expected variation.
- SQL query is designed to identify and analyze height alerts for manufacturing parts based on statistical control limits.
- It consists of several Common Table Expressions (CTEs) and a final SELECT statement.
- The CTE
alerts
calculates whether the height of a part is outside the control limits (UCL and LCL) for each operator. - The CTE
operator_alerts
counts the number of alerts for each operator. - The CTE
alert_stats
calculates the total number of alerts, the alert rate for each operator, and the average alert rate. - The final SELECT statement selects the operator, alert count, alert rate, average alert rate, and whether the alert rate is above average.
- SQL query is designed to identify and analyze height alerts for manufacturing parts based on statistical control limits.
β
β
β
β
β