Skip to content

Objective:

We will create a linear regression model to forecast sales figures using an advertising spend dataset. Furthermore, we will employ standard performance indicators like as R-squared and root mean squared error. We will also employ k-fold cross validation and regularization to limit the risk of overfitting in regression models.

import pandas as pd
import numpy as np
import warnings

pd.set_option('display.expand_frame_repr', False)

warnings.filterwarnings("ignore", category=DeprecationWarning)
warnings.filterwarnings("ignore", category=FutureWarning)

1. Regression

Predicting blood glucose levels:

Let's use a dataset containing data women's health to predict glucose levels in the blood.

df = pd.read_csv('diabetes_clean.csv', index_col=None)
df.head()

Creating feature and target arrays:

diabetes_df = df.loc[(df['glucose'] != 0) & (df['bmi'] != 0)].copy()
X = diabetes_df.drop("glucose", axis=1).values
y = diabetes_df["glucose"].values

print(type(X), type(y))
display(X[:5,:])

Making predictions from a single feature:

To begin, let us attempt to predict blood glucose levels using only one resource: the BMI.

X_bmi = X[:,4]
print(X_bmi[:5])
print(y.shape, X_bmi.shape) # confirm its shape

# sklearn use 2D data, so we reshape it
X_bmi = X_bmi.reshape(-1, 1)
print(X_bmi.shape)

Plotting glucose vs. body mass index:

import matplotlib.pyplot as plt

plt.scatter(X_bmi, y)
plt.ylabel("Blood Gluecose (mg/dl)")
plt.xlabel("Body Mass Index")
plt.show()

NOTE:

We can see that as the body mass index rises, so do blood glucose levels.

Fitting a regression model

Creating a regression model from data!

from sklearn.linear_model import LinearRegression

reg = LinearRegression()
reg.fit(X_bmi, y)
predictions = reg.predict(X_bmi)
plt.scatter(X_bmi,y)
plt.plot(X_bmi, predictions, color='black')
plt.ylabel("Blood Glucose (mg/dl)")
plt.xlabel("Body Mass Index")
plt.show()

There appears to be a weak to moderate positive correlation between blood glucose and body mass index.