Predicting Industrial Machine Downtime: Level 1
📖 Background
You work for a manufacturer of high-precision metal components used in aerospace, automotives, and medical device applications. Your company operates three different machines on its shop floor that produce different sized components, so minimizing the downtime of these machines is vital for meeting production deadlines.
Your team wants to use a data-driven approach to predicting machine downtime, so proactive maintenance can be planned rather than being reactive to machine failure. To support this, your company has been collecting operational data for over a year and whether each machine was down at those times.
In this first level, you're going to explore and describe the data. This level is aimed towards beginners. If you want to challenge yourself a bit more, check out level two!
💾 The data
The company has stored the machine operating data in a single table, available in 'data/machine_downtime.csv'
.
Each row in the table represents the operational data for a single machine on a given day:
"Date"
- the date the reading was taken on."Machine_ID"
- the unique identifier of the machine being read."Assembly_Line_No"
- the unique identifier of the assembly line the machine is located on."Hydraulic_Pressure(bar)"
,"Coolant_Pressure(bar)"
, and"Air_System_Pressure(bar)"
- pressure measurements at different points in the machine."Coolant_Temperature"
,"Hydraulic_Oil_Temperature"
, and"Spindle_Bearing_Temperature"
- temperature measurements (in Celsius) at different points in the machine."Spindle_Vibration"
,"Tool_Vibration"
, and"Spindle_Speed(RPM)"
- vibration (measured in micrometers) and rotational speed measurements for the spindle and tool."Voltage(volts)"
- the voltage supplied to the machine."Torque(Nm)"
- the torque being generated by the machine."Cutting(KN)"
- the cutting force of the tool."Downtime"
- an indicator of whether the machine was down or not on the given day.
Executive summary and recommandations
As proactiveness is very important in many areas, our manufacturing company has decided to take actions towards it. Data of machine exploitation were recorded and send for analysis. Here are what came out:
- The total number of observation or readings primarily was
2500
but after cleaning one got2378
records. - The readings were taken on a period of 7 months approximately starting from
2021-11-24
to2022-06-19
- The average torque of machines is
25.20
- The Assembly line with the highest readings is Shopfloor-L1 either for machine failure or not.
Recommandations
Here are some recommendations:
- The dataset is made up of 12 numeric field and 4 non numeric field. As it's mainly numeric fields which have missing values, they could be replaced by their median values for there are mean values affected by outliers.
- Further analysis can be also done to establish relationship between other fields (relationship between two or more variables) to enhance model's target choices.
import pandas as pd
downtime = pd.read_csv('data/machine_downtime.csv')
downtime.head()
Getting to know the dataset
downtime.info()
downtime.describe()
downtime.drop_duplicates()
Explore negative and zero values for some variables in detail
downtime[(downtime['Hydraulic_Pressure(bar)'] <= 0) | (downtime['Spindle_Vibration'] <= 0) |
(downtime['Spindle_Speed(RPM)'] <= 0) | (downtime['Torque(Nm)'] <= 0)]
Cleaning the dataset
There are negative or zero values for some operational measures with a machine downtime value of No_Machine_Failure. This can't be. So we need to remove those line.
downtime_clean = downtime[~(((downtime['Hydraulic_Pressure(bar)'] <= 0) | (downtime['Spindle_Vibration'] <= 0) |
(downtime['Spindle_Speed(RPM)'] <= 0) | (downtime['Torque(Nm)'] <= 0)) &
(downtime['Downtime'] != 'Machine_Failure'))]
downtime_clean.describe()
downtime_clean.isna().mean() * 100
There are missing values only for numeric fields. The proportion of Nan values for each numeric variable is less than 1%. Futher cleaning can be done by performing simple imputation using mean or median value or just dropping missing values from the dataset. One has choosen to drop the records with missing values.
‌
‌