Skip to content
0

Predicting Industrial Machine Downtime: Level 1

📖 Background

You work for a manufacturer of high-precision metal components used in aerospace, automotives, and medical device applications. Your company operates three different machines on its shop floor that produce different sized components, so minimizing the downtime of these machines is vital for meeting production deadlines.

Your team wants to use a data-driven approach to predicting machine downtime, so proactive maintenance can be planned rather than being reactive to machine failure. To support this, your company has been collecting operational data for over a year and whether each machine was down at those times.

In this first level, you're going to explore and describe the data. This level is aimed towards beginners. If you want to challenge yourself a bit more, check out level two!

💾 The data

The company has stored the machine operating data in a single table, available in 'data/machine_downtime.csv'.

Each row in the table represents the operational data for a single machine on a given day:
  • "Date" - the date the reading was taken on.
  • "Machine_ID" - the unique identifier of the machine being read.
  • "Assembly_Line_No" - the unique identifier of the assembly line the machine is located on.
  • "Hydraulic_Pressure(bar)", "Coolant_Pressure(bar)", and "Air_System_Pressure(bar)" - pressure measurements at different points in the machine.
  • "Coolant_Temperature", "Hydraulic_Oil_Temperature", and "Spindle_Bearing_Temperature" - temperature measurements (in Celsius) at different points in the machine.
  • "Spindle_Vibration", "Tool_Vibration", and "Spindle_Speed(RPM)" - vibration (measured in micrometers) and rotational speed measurements for the spindle and tool.
  • "Voltage(volts)" - the voltage supplied to the machine.
  • "Torque(Nm)" - the torque being generated by the machine.
  • "Cutting(KN)" - the cutting force of the tool.
  • "Downtime" - an indicator of whether the machine was down or not on the given day.

💪 Competition challenge

Create a report that covers the following:

  1. What is the first and last date readings were taken on?
  2. What is the average Torque?
  3. Which assembly line has the highest readings of machine downtime?
import pandas as pd
downtime = pd.read_csv('data/machine_downtime.csv')
downtime.head()

1.Initial Inspection Let's extend the exploration:

info = pd.DataFrame(downtime.info())
print(info)
Current Type: Bar
Current X-axis: None
Current Y-axis: None
Current Color: None

Understand the Dataset From the columns described, here's what to focus on:

Categorical Columns: "Date", "Machine_ID", "Assembly_Line_No", and "Downtime". Numerical Columns: All measurements like "Hydraulic_Pressure(bar)", "Coolant_Pressure(bar)", etc.

Using histograms to visualize numerical data distributions:

import matplotlib.pyplot as plt

numerical_columns = [
    "Hydraulic_Pressure(bar)", "Coolant_Pressure(bar)", 
    "Air_System_Pressure(bar)", "Coolant_Temperature", 
    "Hydraulic_Oil_Temperature", "Spindle_Bearing_Temperature", 
    "Spindle_Vibration", "Tool_Vibration", "Spindle_Speed(RPM)", 
    "Voltage(volts)", "Torque(Nm)"
]
available_columns = [col for col in numerical_columns if col in downtime.columns]
downtime[available_columns].hist(figsize=(15, 10), bins=20)
plt.tight_layout()
plt.show()
  1. What is the first and last date readings were taken on?
import datetime
downtime["Date"] = pd.to_datetime(downtime["Date"], format="%d-%m-%Y")
first_date = downtime['Date'].min()
last_date = downtime['Date'].max()

print(f"First date: {first_date}")
print(f"Last date: {last_date}")
  1. What is the average Torque?
average_torque = downtime['Torque(Nm)'].mean()
print(f"Average Torque: {average_torque:.2f} Nm")
  1. Which assembly line has the highest readings of machine downtime?
‌
‌
‌