Skip to content

ASCVD Understanding and Prediction

Table of Contents

  1. Introduction
  2. Data & Methodology
  3. Data Understanding
  4. Data Preparation
  5. Exploratory Analysis
  6. Modeling & Evaluation
  7. Conclusion & Recommendations

1. Introduction

1.1. Problem Statement

Atherosclerosis is a disease that affects the walls of the arteries and can cause them to become thicker and less elastic. This condition is a leading cause of death globally, and can lead to serious health problems such as heart attacks, strokes, and damage to arteries in the legs. Risk factors for atherosclerosis include high cholesterol, diabetes, smoking, family history, being sedentary, being overweight, and high blood pressure. Symptoms are caused by a reduced or blocked blood flow due to plaque buildup, and can vary depending on the artery affected. Diagnosis is done through medical examinations like angiography or ultrasonography. Treatment includes modifying risk factors, making lifestyle changes, taking antiplatelet drugs and antiatherogenic drugs. By identifying and analyzing risk factors, we can predict the likelihood of developing ASCV disease, and take preventative measures accordingly.

1.2. Project Objective

This personal project aims to:

  • Analyze the impact of different factors, such as age, gender, medical examination results, etc., on the development of cardiovascular disease.

  • Build a machine learning model to predict the presence or absence of cardiovascular disease using those features.

1.3. Executive Summary

Our analytics report aims to help us achieve our project goals by exploring the impact of demographics, physical characteristics, health examination results, and lifestyle on the development of ASCVD. Through this analysis, we have identified critical features that increase the risk of ASCVD and built a promising machine-learning model (with an F1 Score of 0.74 and an ROC-AUC of 0.80) to predict the presence or absence of cardiovascular disease.

Our data analytic process helped us discover some significant findings that can help us take preventive measures to reduce the risk of ASCVD:

  • Age, high blood pressure, high blood glucose, high cholesterol, and obesity are high-risk factors for ASCVD. When these factors combine, they increase the risk even further.
  • A lifestyle with regular physical activity can help reduce the risk of having ASCVD.
  • Although people who smoke and drink alcohol are not at risk of developing ASCVD, they can still increase their risk of cardiovascular disease by leading a sedentary lifestyle and continuing these bad habits.

With this information, we can take proactive steps to reduce the risk of ASCVD and improve our overall health.

2. Data and Methodology

2.1. The Data

The data used in this project is taken from Kaggle (source).

There are 3 types of input features:

  • Objective: patient's demographics;
  • Examination: results of medical examination;
  • Subjective: information given by the patient (lifestyle).
FeatureVariable TypeVariableValue Type
AgeObjective Featureageint (days)
HeightObjective Featureheightint (cm)
WeightObjective Featureweightfloat (kg)
GenderObjective Featuregendercategorical code (1 - women, 2 - men)
Systolic blood pressureExamination Featureap_hiint
Diastolic blood pressureExamination Featureap_loint
CholesterolExamination Featurecholesterol1: normal, 2: above normal, 3: well above normal
GlucoseExamination Featuregluc1: normal, 2: above normal, 3: well above normal
SmokingSubjective Featuresmokebinary
Alcohol intakeSubjective Featurealcobinary
Physical activitySubjective Featureactivebinary
Presence or absence of cardiovascular diseaseTarget Variablecardiobinary

All of the dataset values were collected at the moment of medical examination.

2.2. Methodology

Our process of analyzing data involves various methods as detailed below:

Data Understanding

  • Collecting the initial data from Kaggle and importing it into the DataFrame.
  • Utilizing several tools and techniques to comprehend the structure, contents, and quality of the data and identify potential issues that require further investigation or correction.

Data Preparation

  • Data cleaning: Using a data cleaning checklist to recognize and resolve any quality problems with the data, including issues with data constraints, text and categorical data, data uniformity, and missing data.
  • Data transformation: Modifying and creating new variables from existing data to make it more appropriate for our analysis objectives.
  • Data validation: Verifying and validating the data after cleaning and transforming.

Exploratory Analysis

  • Conducting data analysis, statistical tests and visualizing data to discover insights.
  • Our exploratory data analysis includes univariate, bivariate, and multivariate analysis tasks.

Modeling & Evaluation

  • Building a machine learning model on the training dataset with the identified findings.
  • Evaluating the trained model on the unseen dataset with fine-tuning parameters.

2.3. Importing Libraries

To make our data analysis, visualization, and modeling more efficient, we have developed some user-defined function modules. To use them, please ensure that the module files are copied to the same directory as this notebook.

(+) Libraries


# Pandas
import pandas as pd
pd.set_option("display.max_columns", None)

# Numpy and others
import os, sys, glob, re, math
import numpy as np

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Machine Learning
import sklearn.tree as skltr
import sklearn.ensemble as sklen
from xgboost import XGBClassifier
from lightgbm import LGBMClassifier
import sklearn.metrics as sklme

(+) User-defined Modules