Competition - predicting concrete strength

Can you predict the strength of concrete?

📖 Background

You work in the civil engineering department of a major university. You are part of a project testing the strength of concrete samples.

Concrete is the most widely used building material in the world. It is a mix of cement and water with gravel and sand. It can also include other materials like fly ash, blast furnace slag, and additives.

The compressive strength of concrete is a function of components and age, so your team is testing different combinations of ingredients at different time intervals.

The project leader asked you to find a simple way to estimate strength so that students can predict how a particular sample is expected to perform.

💾 The data

The team has already tested more than a thousand samples (source):

Compressive strength data:

"cement" - Portland cement in kg/m3
"slag" - Blast furnace slag in kg/m3
"fly_ash" - Fly ash in kg/m3
"water" - Water in liters/m3
"superplasticizer" - Superplasticizer additive in kg/m3
"coarse_aggregate" - Coarse aggregate (gravel) in kg/m3
"fine_aggregate" - Fine aggregate (sand) in kg/m3
"age" - Age of the sample in days
"strength" - Concrete compressive strength in megapascals (MPa)

Acknowledgments: I-Cheng Yeh, "Modeling of strength of high-performance concrete using artificial neural networks," Cement and Concrete Research, Vol. 28, No. 12, pp. 1797-1808 (1998).

import pandas as pd
df = pd.read_csv('data/concrete_data.csv')
df.head()

💪 Challenge

Provide your project leader with a formula that estimates the compressive strength. Include:

The average strength of the concrete samples at 1, 7, 14, and 28 days of age.
The coefficients , ... , to use in the following formula:

🧑‍⚖️ Judging criteria

This is a community-based competition. The top 5 most upvoted entries will win.

The winners will receive DataCamp merchandise.

✅ Checklist before publishing

Rename your workspace to make it descriptive of your work. N.B. you should leave the notebook name as notebook.ipynb.
Remove redundant cells like the judging criteria, so the workbook is focused on your work.
Check that all the cells run without error.

⌛️ Time is ticking. Good luck!

import pandas as pd
import plotly.express as px
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.graph_objects as go

df.info()

1030 entries
No null values
integers and floats only
Output is strength

m = {'duplicated': ['Yes', 'No'], 'count': [df.duplicated().sum(), len(df)-df.duplicated().sum()]}
df_miss = pd.DataFrame(data=m)
df_miss

fig = px.pie(df_miss, values='count', names='duplicated',hole=0.3,\
             color_discrete_sequence=px.colors.sequential.Blackbody 
            ,width=500, height=500)
fig.show()

df.drop_duplicates(inplace=True)

df.columns

fig = px.box(df, x=df.columns \
           )
fig.update_xaxes(title_text='')
fig.update_yaxes(title_text='')


fig.show()

‌
‌
‌