Executive Summary
- Dataset contains 2,500 records from 3 machines with less than 1% missing values across all sensors
- Class distribution shows balance with 50.6% normal operations and 49.4% downtime events
- March-April 2022 experienced highest failure concentrations with 3x more failures on weekdays than weekends
- Tuesday and Wednesday show peak failure incidents across all machines
- Hydraulic pressure displays unstable readings (20-200 bar) before failures
- All correlations between sensor measurements are weak (below 0.25)
- Six measurements emerged as strong predictors: hydraulic pressure, tool vibration, spindle speed, cutting force, torque, and coolant pressure
- Three machines maintain similar readings: ~100 bar hydraulic pressure, ~18°C coolant temperature, ~20,000 RPM spindle speed
- Single predictive model recommended due to uniform patterns across machines
- All sensor measurements provide independent information for failure prediction
I. Background
Manufacturing high-precision metal components requires consistent machine performance. A major manufacturer in aerospace, automotive, and medical device sectors operates three machines for different component sizes. Current maintenance relies on reactive approaches, leading to production delays and increased costs. The company collected operational data for over a year to shift towards predictive maintenance.
II. Objectives
The analysis aims to:
- Identify correlations between operational variables in the manufacturing process
- Detect temporal patterns in machine downtime
- Determine key factors linked to machine failures
- Create a foundation for developing predictive maintenance models
III. About the Dataset
The dataset contains daily operational readings from three manufacturing machines over a one-year period. It tracks 13 mechanical measurements including pressure, temperature, vibration, and electrical readings. Each record includes the date, machine ID, assembly line number, and downtime status. This comprehensive data enables both broad operational analysis and specific investigation of conditions leading to machine failures.
| Column Name | Description | Unit | Significance |
|---|---|---|---|
| Date | Daily timestamp of readings | YYYY-MM-DD | Tracks temporal patterns and maintenance history |
| Machine_ID | Unique machine identifier | Text | Enables machine-specific analysis and comparisons |
| Assembly_Line_No | Production line location | Integer | Maps physical layout and workflow dependencies |
| Hydraulic_Pressure | Hydraulic system pressure | bar | Indicates fluid power system health |
| Coolant_Pressure | Cooling system pressure | bar | Monitors heat dissipation efficiency |
| Air_System_Pressure | Pneumatic system pressure | bar | Reflects compressed air system status |
| Coolant_Temperature | Cooling system temperature | Celsius | Tracks thermal management effectiveness |
| Hydraulic_Oil_Temperature | Hydraulic fluid temperature | Celsius | Indicates system stress and oil condition |
| Spindle_Bearing_Temperature | Bearing temperature | Celsius | Monitors critical component health |
| Spindle_Vibration | Spindle oscillation | micrometers | Detects mechanical imbalances |
| Tool_Vibration | Cutting tool movement | micrometers | Indicates tool wear and stability |
| Spindle_Speed | Rotational velocity | RPM | Measures cutting performance |
| Voltage | Electrical input | volts | Monitors power supply stability |
| Torque | Rotational force | Nm | Indicates mechanical load |
| Cutting | Tool force | KN | Measures material removal effort |
| Downtime | Operational status | Boolean | Records machine availability |
The company has stored the machine operating data in a single table, available in 'data/machine_downtime.csv'.
1 hidden cell
# Load the dataset and display the first 5 rows
data = pd.read_csv('data/machine_downtime.csv')
data.head()# Display information about the dataset including the data types and non-null counts for each column
data.info()- Missing Values: There are missing values in the dataset since the non-null count is not consistent across all columns.
- Column Naming Convention: The column names do not follow the snake-case naming convention.
- Data Types: The data types are appropriate and reflective of the data they hold.
The current production setup consists of three machines operating across three assembly lines. This configuration presents a significant vulnerability in terms of production capacity.
1 hidden cell
IV. Exploratory Data Analysis
This section examines data characteristics to understand patterns and relationships. These insights will guide feature selection and preprocessing for predictive modeling.
# Display summary statistics of the DataFrame 'data'
data.describe().round(2)Hydraulic pressure shows impossible negative readings (-14.33 bar) and unusually high spikes (191 bar) against a normal range of 76-126 bar. Coolant temperature has extreme jumps to 98.2°C while typically operating between 10-25°C. Voltage readings swing from 202V to 479V, far from the typical 319-380V range. These extreme values could significantly impact our model's performance if not addressed. Less Concerning Variations:
Spindle speed drops to 0 RPM and peaks at 27,957 RPM, but these might represent actual operational states rather than errors. Torque variations (0-55.55 Nm) and tool vibration (2.16-45.73) show wide ranges but follow expected operational patterns. Air system pressure stays remarkably consistent (5.06-7.97 bar), making it a potentially reliable predictor.
For modeling purposes, we should handle the clearly erroneous readings in hydraulic pressure, coolant temperature, and voltage through robust scaler since, while preserving the natural variations in spindle speed and torque that might indicate different operating conditions.