Examining Factors Responsible for Heart Attacks
Objective:
Cardiovascular diseases are the leading cause of death globally. This analysis aims to identify the leading factors of Cardiovascular Diseases, using Logistic Regression model to predict the outcome of the test data. Lastly, we will use an confusion matris and report the model performance with Recall, Precision and Accuracy assessment.
Variable Descriptions:
age: age in years
sex: (1 = male; 0 = female)
cp: chest pain type
- Value 0: typical angina
- Value 1: atypical angina
- Value 2: non-anginal pain
- Value 3: asymptomatic
trestbps: resting blood pressure (in mm Hg)
chol: serum cholestoral in mg/dl
fbs: (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
restecg: resting electrocardiographic results
- Value 0: normal
- Value 1: having ST-T wave abnormality (T wave inversions and/or-elevation or ST depression of > 0.05 mV)
- Value 2: showing probable or definite left ventricular hypertrophy by Estes' criteria
thalach: maximum heart rate achieved
exang: exercise induced angina (1 = yes; 0 = no)
oldpeak: ST depression induced by exercise relative to rest
slope: the slope of the peak exercise ST segment
ca: number of major vessels (0-3) colored by flourosopy
thal: thalassemia types:
- thal value 0 = Silent carrier
- thal value 1 = Mild carrier
- thal value 2 = Reverseable carrier
- thal value 3 = Fixed defect carrier
target: 0= less chance of heart attack, 1= more chance of heart attack
1. Import Modules & Data
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')
df = pd.read_excel('heart data.xlsx')
df.info()- Dataset has 14 columns and 303 rows (inc header)
- There appears to be no missing values
- All columns contain int64 or float64 datatypes
df.shape2. Data Wrangling
# check missing values
df.isnull().sum()# check for duplicates
df.duplicated().any()# drop duplicates and keep first occurance
df.drop_duplicates(keep='first', inplace=True)
df.reset_index(drop=True, inplace=True)