Can you predict the strength of concrete?
📖 Background
You work in the civil engineering department of a major university. You are part of a project testing the strength of concrete samples.
Concrete is the most widely used building material in the world. It is a mix of cement and water with gravel and sand. It can also include other materials like fly ash, blast furnace slag, and additives.
The compressive strength of concrete is a function of components and age, so your team is testing different combinations of ingredients at different time intervals.
The project leader asked you to find a simple way to estimate strength so that students can predict how a particular sample is expected to perform.
💾 The data
The team has already tested more than a thousand samples (source):
Compressive strength data:
- "cement" - Portland cement in kg/m3
- "slag" - Blast furnace slag in kg/m3
- "fly_ash" - Fly ash in kg/m3
- "water" - Water in liters/m3
- "superplasticizer" - Superplasticizer additive in kg/m3
- "coarse_aggregate" - Coarse aggregate (gravel) in kg/m3
- "fine_aggregate" - Fine aggregate (sand) in kg/m3
- "age" - Age of the sample in days
- "strength" - Concrete compressive strength in megapascals (MPa)
Acknowledgments: I-Cheng Yeh, "Modeling of strength of high-performance concrete using artificial neural networks," Cement and Concrete Research, Vol. 28, No. 12, pp. 1797-1808 (1998).
import pandas as pd
df = pd.read_csv('data/concrete_data.csv')
df.head()💪 Challenge
Provide your project leader with a formula that estimates the compressive strength. Include:
- The average strength of the concrete samples at 1, 7, 14, and 28 days of age.
- The coefficients
, ... , to use in the following formula:
🧑⚖️ Judging criteria
This is a community-based competition. The top 5 most upvoted entries will win.
The winners will receive DataCamp merchandise.
✅ Checklist before publishing
- Rename your workspace to make it descriptive of your work. N.B. you should leave the notebook name as notebook.ipynb.
- Remove redundant cells like the judging criteria, so the workbook is focused on your work.
- Check that all the cells run without error.
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
import scipy.stats as stats
from sklearn import linear_model
from sklearn.linear_model import LinearRegression
import statsmodels
import statsmodels.api as sm
import statsmodels.formula.api as smf
import statsmodels.stats.api as sms
from statsmodels.compat import lzip
from mpl_toolkits.mplot3d import Axes3D
df = pd.read_csv('data/concrete_data.csv')
df.index
dfdf.agg(['max', 'min',"mean"])X=df[["cement","slag","fly_ash","water","superplasticizer","coarse_aggregate","fine_aggregate","age" ]]
Y=df["strength"]
#Kitchen sink model with constant
X2= sm.add_constant(X)
model = sm.OLS(Y, X2)
model_res =model.fit()
model_res.summary()
#Kitchen sink model without constant
model2 = sm.OLS(Y, X)
model2_res =model2.fit()
model2_res.summary()
#Checking for colinearity
sns.pairplot(X)
round(df.corr(),2)#Residuals
sns.boxplot(model_res.resid, showmeans=True)
plt.show()