Can you estimate the age of an abalone?
๐ Background
You are working as an intern for an abalone farming operation in Japan. For operational and environmental reasons, it is an important consideration to estimate the age of the abalones when they go to market.
Determining an abalone's age involves counting the number of rings in a cross-section of the shell through a microscope. Since this method is somewhat cumbersome and complex, you are interested in helping the farmers estimate the age of the abalone using its physical characteristics.
๐พ The data
You have access to the following historical data (source):
Abalone characteristics:
- "sex" - M, F, and I (infant).
- "length" - longest shell measurement.
- "diameter" - perpendicular to the length.
- "height" - measured with meat in the shell.
- "whole_wt" - whole abalone weight.
- "shucked_wt" - the weight of abalone meat.
- "viscera_wt" - gut-weight.
- "shell_wt" - the weight of the dried shell.
- "rings" - number of rings in a shell cross-section.
- "age" - the age of the abalone: the number of rings + 1.5.
Acknowledgments: Warwick J Nash, Tracy L Sellers, Simon R Talbot, Andrew J Cawthorn, and Wes B Ford (1994) "The Population Biology of Abalone (Haliotis species) in Tasmania. I. Blacklip Abalone (H. rubra) from the North Coast and Islands of Bass Strait", Sea Fisheries Division, Technical Report No. 48 (ISSN 1034-3288).
๐ช Competition challenge
Create a report that covers the following:
- How does weight change with age for each of the three sex categories?
- Can you estimate an abalone's age using its physical characteristics?
- Investigate which variables are better predictors of age for abalones.
Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style("whitegrid")
import statsmodels.api as sm
from statsmodels.stats.outliers_influence import variance_inflation_factor
from statsmodels.stats.diagnostic import het_breuschpagan
from statsmodels.graphics.regressionplots import plot_leverage_resid2
from sklearn.linear_model import HuberRegressor
from scipy.stats import shapiro
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
3 hidden cells
Exploratory Data Analysis
Relationship among features
sns.pairplot(
data = data.sample(frac = 0.10, random_state = 42),
vars = ['length', 'diameter', 'height', 'whole_wt', 'shucked_wt',
'viscera_wt', 'shell_wt', 'rings', 'age'],
hue = 'sex',
kind = 'scatter',
diag_kind = 'kde'
)
plt.title("Pair plots of all relevant variables")
plt.show()
Relationship between weight, age, and sex
Q1: How does weight change with age for each of the three sex categories?
Answer
โ
โ