this is the nav!
Predicting concrete strength with linear regression, my old friend
• AI Chat
• Code
• Report
• ## .mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Can you predict the strength of concrete?

### 📖 Background

You work in the civil engineering department of a major university. You are part of a project testing the strength of concrete samples.

Concrete is the most widely used building material in the world. It is a mix of cement and water with gravel and sand. It can also include other materials like fly ash, blast furnace slag, and additives.

The compressive strength of concrete is a function of components and age, so your team is testing different combinations of ingredients at different time intervals.

The project leader asked you to find a simple way to estimate strength so that students can predict how a particular sample is expected to perform.

### 💾 The data

The team has already tested more than a thousand samples (source):

##### Compressive strength data:
• "cement" - Portland cement in kg/m3
• "slag" - Blast furnace slag in kg/m3
• "fly_ash" - Fly ash in kg/m3
• "water" - Water in liters/m3
• "superplasticizer" - Superplasticizer additive in kg/m3
• "coarse_aggregate" - Coarse aggregate (gravel) in kg/m3
• "fine_aggregate" - Fine aggregate (sand) in kg/m3
• "age" - Age of the sample in days
• "strength" - Concrete compressive strength in megapascals (MPa)

Acknowledgments: I-Cheng Yeh, "Modeling of strength of high-performance concrete using artificial neural networks," Cement and Concrete Research, Vol. 28, No. 12, pp. 1797-1808 (1998).

### 💪 Challenge

Provide your project leader with a formula that estimates the compressive strength. Include:

1. The average strength of the concrete samples at 1, 7, 14, and 28 days of age.
2. The coefficients , ... , to use in the following formula:

## Imports

.mfe-app-workspace-11z5vno{font-family:JetBrainsMonoNL,Menlo,Monaco,'Courier New',monospace;font-size:13px;line-height:20px;}import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
from pyod.models.iforest import IForest
from statsmodels.stats.outliers_influence import variance_inflation_factor
from scipy.stats.mstats import winsorize
from statsmodels.stats.diagnostic import het_breuschpagan

from sklearn.model_selection import train_test_split
#from sklearn.linear_model import LinearRegression
#from sklearn.metrics import mean_squared_error, r2_score

sns.set_style('whitegrid')

df.head()
original_shape = df.shape[0]
df = df.drop_duplicates()
print(f"Dropped {original_shape - df.shape[0]} rows.")

## EDA

df.describe()
plt.figure(figsize = (16, 6))
sns.boxplot(data = pd.melt(df), y='variable', x = 'value')
plt.tight_layout()

### Estimate average strength of the concrete samples at 1, 7, 14, and 28 days of age.

indices = [1, 7, 14, 28]
sdf = df.groupby('age')['strength'].mean()
sdf = sdf[sdf.index.isin(indices)].reset_index()
sdf
plt.figure(figsize = (16, 4))

sns.barplot(data = sdf, x = 'age', y = 'strength')
plt.tight_layout()

### Correlation

features = ['cement', 'slag', 'fly_ash', 'water', 'superplasticizer', 'coarse_aggregate', 'fine_aggregate', 'age']
target = 'strength'
plt.figure(figsize = (16, 6))
sns.heatmap(df.corr(), annot = True, linewidths=2)
plt.tight_layout()