Skip to content
0

Is it possible to estimate the age of an abalone?

๐Ÿ“– Background

Japan has a developed seafood market and farming abalones is a significant part of it. For operational and environmental reasons, it is an important consideration to estimate the age of the abalones when they go to market.

Determining an abalone's age involves counting the number of rings in a cross-section of the shell through a microscope. Since this method is somewhat cumbersome and complex, you are interested in helping the farmers estimate the age of the abalone using its physical characteristics.

It is crucial for the analysis design to decide whether to attempt predicting the age of a live abalone or to use all of its given characteristics as a unit prepared for seafood market. In this take we would focus on predictions based on all of the data, but age prediction based only from the measures obtained of a live abalone could be a promising future study.

๐Ÿ’พ The data

The dataset was made from the following historical data (source):

Abalone characteristics
VariableExplanation
0sexM, F, and I (infant)
1lengthlongest shell measurement
2diameterperpendicular to the length
3heightmeasured with meat in the shell
4whole_wtwhole abalone weight
5shucked_wtthe weight of abalone meat
6viscera_wtgut-weight
7shell_wtthe weight of the dried shell
8ringsnumber of rings in a shell cross-section
9agethe age of the abalone: the number of rings + 1.5

Acknowledgments: Warwick J Nash, Tracy L Sellers, Simon R Talbot, Andrew J Cawthorn, and Wes B Ford (1994) "The Population Biology of Abalone (Haliotis species) in Tasmania. I. Blacklip Abalone (H. rubra) from the North Coast and Islands of Bass Strait", Sea Fisheries Division, Technical Report No. 48 (ISSN 1034-3288).

Imports and settings
%%capture

pip install synthia pyvinecopulib tensorflow seaborn
import pandas as pd
import seaborn as sns
import seaborn.objects as so
from seaborn import axes_style
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import synthia as syn
import pyvinecopulib as pv

print(sns.__version__)
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

# Make NumPy and pandas printouts easier to read
%matplotlib inline
rc_params = {**axes_style('whitegrid'),
             'legend.markerscale': 3,
             'grid.linestyle': ':', 
             'axes.spines.top': False,
             'axes.spines.right':  False}
mpl.rcParams.update(rc_params)
cmap = mpl.cm.get_cmap('plasma')
np.set_printoptions(precision=2, suppress=True)
pd.set_option('display.precision', 2)
pd.set_option('display.float_format', lambda x: '%.2f' % x)
from tensorflow.keras.models import Model
from tensorflow.keras.layers import (Input, Dense, Concatenate, 
                                     Embedding, Flatten, Normalization)
from tensorflow.keras.utils import to_categorical
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Lasso
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error, r2_score
Read and process the dataset
abalone = pd.read_csv('./data/abalone.csv', 
                      dtype={'sex': 'category'})
abalone.info()
abalone.sample(n=10)

Data exploration

abalone.describe().T
โ€Œ
โ€Œ
โ€Œ