this is the nav!
Starbucks & Multi-linear Regression (Python)
• AI Chat
• Code
• Report
• ### .mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}In this brief exploration, we will explore a dataset containing information about drinks from Starbucks. Our goal is to construct a fundamental multi-linear model that estimates calorie counts based on factors such as fat, carbohydrates, protein, and other nutrients.

```.mfe-app-workspace-11z5vno{font-family:JetBrainsMonoNL,Menlo,Monaco,'Courier New',monospace;font-size:13px;line-height:20px;}```#These are the libraries we'll need
import pandas as pd
import statsmodels.formula.api as smf
import seaborn as sns
import matplotlib.pyplot as plt``````

Let's start by reading the csv file into a dataframe called "drinks".

``drinks = pd.read_csv("starbucks-menu-nutrition-drinks.csv")``
``drinks``

Next, we'll eliminate rows containing null values, set the initial column as the index, and rename the remaining columns for easier referencing.

``````drinks = pd.read_csv("starbucks-menu-nutrition-drinks.csv",index_col = 0, na_values=["-"])
drinks = drinks.dropna(axis = 0)
drinks.columns = ["calories", "fat", "carbs", "fiber", "protein", "sodium"]``````

We can see that the dataframe is much easier to work with now.

``drinks``

Let's reorder the drinks by the highest number of calories. We can see that, at least in this dataset, Starbucks Signature Hot Chocolate contains the most amount of calories.

``drinks.sort_values(by="calories", ascending=False)``

Let's explore the relationship between calories and carbohydrates. From the scatter plot and regression line, we can see that there is a strong postive relationship.

``````sns.regplot(x="carbs", y="calories", data = drinks)
plt.title("Correlation Between Calories and Carbs")``````

Here we are going to construct a multi-linear model in which the calorie count serves as the dependent variable, while fat, carbs, fiber, protein, and sodium act as independent variables. Let's then display a summary of the model. The R-squared value of 0.997 indicates a good fit, implying that our linear regression model effectively captures the dataset's patterns.

``````lm_drinks = smf.ols(formula = 'calories ~ fat + carbs + fiber + protein + sodium', data = drinks).fit()
lm_drinks.summary()``````