Turning the Mechanism
Table Of Content
-
1 Introduction
-
2 Libraries & Configuration
- 2.1 Libraries
- 2.2 Functions
- 2.3 Configuration
-
3 Data Wrangling
- 3.1 Data Validation
- 3.2 Data cleaning
-
4 Model fitting and Evaluation
- 4.1 Data Pre-Processing
- 4.2 Logistic Regression model
- 4.3 Random Forest model
-
5 Feature Importance
-
6 Individual Machine Prediction models
-
7 Summary
1. Introduction
In the specialized world of high-precision production, The efficiency of the production process depends heavily on the seamless operation of three distinct machines on the shop floor, each tasked with producing specific components.
To meet stringent production deadlines, minimizing machine downtime is paramount. Rather than reacting to unexpected machine failures, a data-driven strategy aimed at predicting downtime and enabling proactive maintenance is being implemented.
This wealth of collected data now forms the basis of the predictive maintenance model. By analyzing historical patterns and performance metrics, the goal is to forecast potential downtimes, optimize maintenance schedules, and ensure continuous, efficient production.
Objective:
The primary objective of implementing a predictive maintenance model is to enhance the operational efficiency of high-precision production processes. By leveraging historical data and performance metrics, the goal is to forecast machine downtimes accurately.
Methodology:
-
Training and evaluating a predictive model to predict machine failure.
-
Machine operation feature with the strongest predictors of machine failure.
-
Model accuracy of individual machine operation data.
Data:
Each row in the table represents the operational data for a single machine on a given day:
-
Date - the date the reading was taken on.
-
Machine_ID - the unique identifier of the machine being read.
-
Assembly_Line_No - the unique identifier of the assembly line the machine is located on.
-
Hydraulic_Pressure(bar), Coolant_Pressure(bar), and Air_System_Pressure(bar) - pressure measurements at different points in the machine.
-
Coolant_Temperature, Hydraulic_Oil_Temperature, and `"Spindle_Bearing_Temperature - temperature measurements (in Celsius) at different points in the machine.
-
Spindle_Vibration, Tool_Vibration, and Spindle_Speed(RPM) - vibration (measured in micrometers) and rotational speed measurements for the spindle and tool.
-
Voltage(volts) - the voltage supplied to the machine.
-
Torque(Nm) - the torque being generated by the machine.
-
Cutting(KN) - the cutting force of the tool.
-
Downtime - an indicator of whether the machine was down or not on the given day.
Summary :
This analysis highlights the critical operational parameters influencing machine performance and failure patterns.
Key Findings
-
Prediction Model:
- The tuned Random Forest model demonstrated robustness, achieving precision and AUC scores of 1.0 in predicting machine downtime.
-
Feature Importance:
- Key factors influencing machine failure include torque (Nm), hydraulic pressure, and cutting force, with the Mann-Whitney U test confirming their significant impact. In contrast, features like air system pressure showed minimal influence.
-
Prediction Model by Specific Machine Data:
- The Random Forest model trained on individual machine data achieved precision scores between 0.96 and 1.0. Similarly, the tuned model trained and tested on the combined dataset attained a precision score of 1.0, indicating consistent performance regardless of the machine-specific data used for training.
2. Libraries & Configurations
2.1 Libraries
Loading the relevant libraries and user-defined functions
"""importing relevant libraries"""
import pandas as pd # for data manipulation
import numpy as np # for data computation
import matplotlib.pyplot as plt #for 2D data visualization
import seaborn as sns #for 2D data visualization
from scipy import stats # for statistics
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import roc_curve
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, f1_score, roc_auc_score, log_loss
from sklearn.inspection import permutation_importance
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import cross_val_score
from sklearn.feature_selection import RFE #feature importance
from math import sqrt
from mpl_toolkits.mplot3d import Axes3D
from scipy.stats import kstest
from scipy.stats import mannwhitneyu
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline2.2 Functions
Defining the functions for the analysis
# utility function to print markdown string
def printmd(string):
display(Markdown(string))
def highlight(row):
color = 'background-color: #8A0303' if row['Downtime'] == 'Machine_Failure' else ''
return [color] * len(row)
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import precision_score, roc_auc_score, roc_curve
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
# Define the model function
def model(machine, df):
# Subset data for the specified machine
df_machine = df[df['Machine_ID'] == machine].copy()
# Map target variable to binary values
df_machine['Downtime'] = df_machine['Downtime'].map({'Machine_Failure': 1, 'No_Machine_Failure': 0})
# Define features and target
y = df_machine['Downtime']
X = df_machine.drop(['Downtime', 'Date', 'Machine_ID', 'Assembly_Line_No'], axis=1)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, test_size=0.3, random_state=SEED)
# Scale the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Initialize the Random Forest classifier
rf = RandomForestClassifier(n_estimators=55, max_depth=7, min_samples_leaf=1, random_state=SEED)
# Fit the model
rf.fit(X_train_scaled, y_train)
# Predictions
y_pred_rf = rf.predict(X_test_scaled)
y_pred_proba_rf = rf.predict_proba(X_test_scaled)[:, 1]
# Evaluation metrics
precision = precision_score(y_test, y_pred_rf)
auc = roc_auc_score(y_test, y_pred_proba_rf)
# Return metrics
return precision, auc
def plot_metrics(metrics_df):
fig, ax = plt.subplots(1, 2, figsize=(12, 6))
# Precision Bar Chart
ax[0].bar(metrics_df['Machine'], metrics_df['Precision'], color='#8A0303')
ax[0].set_title('Precision by Machine')
ax[0].set_ylabel('Precision')
ax[0].set_xticklabels(metrics_df['Machine'], rotation=45, ha='right')
# AUC Bar Chart
ax[1].bar(metrics_df['Machine'], metrics_df['AUC'], color='gray')
ax[1].set_title('AUC by Machine')
ax[1].set_ylabel('AUC')
ax[1].set_xticklabels(metrics_df['Machine'], rotation=45, ha='right')
plt.tight_layout()
plt.show()
2.3 Configurations
Setting the configurations to be used for our analysis.
#set seaborn theme
sns.set_theme(style="darkgrid", palette="colorblind")
#displaying all columns
pd.set_option('display.max_columns', None)
# seed value
SEED = 423. Data Wrangling
Loading and wrangling the data
#loading the dataframe
df= pd.read_csv(r'data/machine_downtime.csv')
#viewing the dataframe
df.head()#checking the number of rows and columns in the dataframe
df.shapeThis data set has 2500 rows and 16 columns consisting of both numeric and categorical features.