### In this brief exploration, we will explore a dataset containing information about drinks from Starbucks. Our goal is to construct a fundamental multi-linear model that estimates calorie counts based on factors such as fat, carbohydrates, protein, and other nutrients.

```
#These are the libraries we'll need
import pandas as pd
import statsmodels.formula.api as smf
import seaborn as sns
import matplotlib.pyplot as plt
```

**Let's start by reading the csv file into a dataframe called "drinks".**

`drinks = pd.read_csv("starbucks-menu-nutrition-drinks.csv")`

`drinks`

**Next, we'll eliminate rows containing null values, set the initial column as the index, and rename the remaining columns for easier referencing.**

```
drinks = pd.read_csv("starbucks-menu-nutrition-drinks.csv",index_col = 0, na_values=["-"])
drinks = drinks.dropna(axis = 0)
drinks.columns = ["calories", "fat", "carbs", "fiber", "protein", "sodium"]
```

**We can see that the dataframe is much easier to work with now.**

`drinks`

**Let's reorder the drinks by the highest number of calories. We can see that, at least in this dataset, Starbucks Signature Hot Chocolate contains the most amount of calories.**

`drinks.sort_values(by="calories", ascending=False)`

**Let's explore the relationship between calories and carbohydrates. From the scatter plot and regression line, we can see that there is a strong postive relationship.**

```
sns.regplot(x="carbs", y="calories", data = drinks)
plt.title("Correlation Between Calories and Carbs")
```

**Here we are going to construct a multi-linear model in which the calorie count serves as the dependent variable, while fat, carbs, fiber, protein, and sodium act as independent variables. Let's then display a summary of the model. The R-squared value of 0.997 indicates a good fit, implying that our linear regression model effectively captures the dataset's patterns.**

```
lm_drinks = smf.ols(formula = 'calories ~ fat + carbs + fiber + protein + sodium', data = drinks).fit()
lm_drinks.summary()
```