ASCVD UNDERSTANDING AND PREDICTION

ASCVD Understanding and Prediction

Introduction
Data & Methodology
Data Understanding
Data Preparation
Exploratory Analysis
Modeling & Evaluation
Conclusion & Recommendations

1. Introduction

1.1. Problem Statement

Atherosclerosis is a disease that affects the walls of the arteries and can cause them to become thicker and less elastic. This condition is a leading cause of death globally, and can lead to serious health problems such as heart attacks, strokes, and damage to arteries in the legs. Risk factors for atherosclerosis include high cholesterol, diabetes, smoking, family history, being sedentary, being overweight, and high blood pressure. Symptoms are caused by a reduced or blocked blood flow due to plaque buildup, and can vary depending on the artery affected. Diagnosis is done through medical examinations like angiography or ultrasonography. Treatment includes modifying risk factors, making lifestyle changes, taking antiplatelet drugs and antiatherogenic drugs. By identifying and analyzing risk factors, we can predict the likelihood of developing ASCV disease, and take preventative measures accordingly.

1.2. Project Objective

This personal project aims to:

Analyze the impact of different factors, such as age, gender, medical examination results, etc., on the development of cardiovascular disease.
Build a machine learning model to predict the presence or absence of cardiovascular disease using those features.

1.3. Executive Summary

Our analytics report aims to help us achieve our project goals by exploring the impact of demographics, physical characteristics, health examination results, and lifestyle on the development of ASCVD. Through this analysis, we have identified critical features that increase the risk of ASCVD and built a promising machine-learning model (with an F1 Score of 0.74 and an ROC-AUC of 0.80) to predict the presence or absence of cardiovascular disease.

Our data analytic process helped us discover some significant findings that can help us take preventive measures to reduce the risk of ASCVD:

Age, high blood pressure, high blood glucose, high cholesterol, and obesity are high-risk factors for ASCVD. When these factors combine, they increase the risk even further.
A lifestyle with regular physical activity can help reduce the risk of having ASCVD.
Although people who smoke and drink alcohol are not at risk of developing ASCVD, they can still increase their risk of cardiovascular disease by leading a sedentary lifestyle and continuing these bad habits.

With this information, we can take proactive steps to reduce the risk of ASCVD and improve our overall health.

2. Data and Methodology

2.1. The Data

The data used in this project is taken from Kaggle (source).

There are 3 types of input features:

Objective: patient's demographics;
Examination: results of medical examination;
Subjective: information given by the patient (lifestyle).

Feature	Variable Type	Variable	Value Type
Age	Objective Feature	age	int (days)
Height	Objective Feature	height	int (cm)
Weight	Objective Feature	weight	float (kg)
Gender	Objective Feature	gender	categorical code (1 - women, 2 - men)
Systolic blood pressure	Examination Feature	ap_hi	int
Diastolic blood pressure	Examination Feature	ap_lo	int
Cholesterol	Examination Feature	cholesterol	1: normal, 2: above normal, 3: well above normal
Glucose	Examination Feature	gluc	1: normal, 2: above normal, 3: well above normal
Smoking	Subjective Feature	smoke	binary
Alcohol intake	Subjective Feature	alco	binary
Physical activity	Subjective Feature	active	binary
Presence or absence of cardiovascular disease	Target Variable	cardio	binary

All of the dataset values were collected at the moment of medical examination.

2.2. Methodology

Our process of analyzing data involves various methods as detailed below:

Data Understanding

Collecting the initial data from Kaggle and importing it into the DataFrame.
Utilizing several tools and techniques to comprehend the structure, contents, and quality of the data and identify potential issues that require further investigation or correction.

Data Preparation

Data cleaning: Using a data cleaning checklist to recognize and resolve any quality problems with the data, including issues with data constraints, text and categorical data, data uniformity, and missing data.
Data transformation: Modifying and creating new variables from existing data to make it more appropriate for our analysis objectives.
Data validation: Verifying and validating the data after cleaning and transforming.

Exploratory Analysis

Conducting data analysis, statistical tests and visualizing data to discover insights.
Our exploratory data analysis includes univariate, bivariate, and multivariate analysis tasks.

Modeling & Evaluation

Building a machine learning model on the training dataset with the identified findings.
Evaluating the trained model on the unseen dataset with fine-tuning parameters.

2.3. Importing Libraries

To make our data analysis, visualization, and modeling more efficient, we have developed some user-defined function modules. To use them, please ensure that the module files are copied to the same directory as this notebook.

(+) Libraries


# Pandas
import pandas as pd
pd.set_option("display.max_columns", None)

# Numpy and others
import os, sys, glob, re, math
import numpy as np

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Machine Learning
import sklearn.tree as skltr
import sklearn.ensemble as sklen
from xgboost import XGBClassifier
from lightgbm import LGBMClassifier
import sklearn.metrics as sklme

(+) User-defined Modules

‌
‌
‌