Skip to content
Using the Regression
Objective:
We will create a linear regression model to forecast sales figures using an advertising spend dataset. Furthermore, we will employ standard performance indicators like as R-squared and root mean squared error. We will also employ k-fold cross validation and regularization to limit the risk of overfitting in regression models.
import pandas as pd
import numpy as np
import warnings
pd.set_option('display.expand_frame_repr', False)
warnings.filterwarnings("ignore", category=DeprecationWarning)
warnings.filterwarnings("ignore", category=FutureWarning)
1. Regression
Predicting blood glucose levels:
Let's use a dataset containing data women's health to predict glucose levels in the blood.
df = pd.read_csv('diabetes_clean.csv', index_col=None)
df.head()
Creating feature and target arrays:
diabetes_df = df.loc[(df['glucose'] != 0) & (df['bmi'] != 0)].copy()
X = diabetes_df.drop("glucose", axis=1).values
y = diabetes_df["glucose"].values
print(type(X), type(y))
display(X[:5,:])
Making predictions from a single feature:
To begin, let us attempt to predict blood glucose levels using only one resource: the BMI.
X_bmi = X[:,4]
print(X_bmi[:5])
print(y.shape, X_bmi.shape) # confirm its shape
# sklearn use 2D data, so we reshape it
X_bmi = X_bmi.reshape(-1, 1)
print(X_bmi.shape)
Plotting glucose vs. body mass index:
import matplotlib.pyplot as plt
plt.scatter(X_bmi, y)
plt.ylabel("Blood Gluecose (mg/dl)")
plt.xlabel("Body Mass Index")
plt.show()
NOTE:
We can see that as the body mass index rises, so do blood glucose levels.
Fitting a regression model
Creating a regression model from data!
from sklearn.linear_model import LinearRegression
reg = LinearRegression()
reg.fit(X_bmi, y)
predictions = reg.predict(X_bmi)
plt.scatter(X_bmi,y)
plt.plot(X_bmi, predictions, color='black')
plt.ylabel("Blood Glucose (mg/dl)")
plt.xlabel("Body Mass Index")
plt.show()
There appears to be a weak to moderate positive correlation between blood glucose and body mass index.