Skip to content
New Workbook
Sign up
Duplicate of Competition - Abalone Seafood Farming

Can you estimate abalone age?

Image(filename='Diseño sin título(1).png')

1.Introduction

Abalone is a shellfish considered a delicacy in many parts of the world. An excellent source of iron and pantothenic acid, and a nutritious food resource and farming in Australia, America and East Asia. 100 grams of abalone yields more than 20% recommended daily intake of these nutrients. The economic value of abalone is positively correlated with its age. Therefore, to detect the age of abalone accurately is important for both farmers and customers to determine its price. However, the current technology to decide the age is quite costly and inefficient. The age of abalone is determined by cutting the shell through the cone, staining it, and counting the number of rings through a microscope -- a laborious task. Other measurements, which are easier to obtain, are used to predict the age. Further information, such as weather patterns and location (hence food availability) may be required to solve the problem. However, for this problem we shall assume that the abalone's physical measurements are sufficient to provide an accurate age prediction.

Paper objectives:

  1. How does weight change with age for each of the three sex categories?
  2. Can you estimate an abalone's age using its physical characteristics?
  3. Investigate which variables are better predictors of age for abalones.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.lines as lines
from scipy.stats import iqr
from skimage import io

from scipy.stats import skew, kurtosis
pd.set_option("display.max_columns",None) 
pd.set_option("display.max_rows",None) 
from sklearn.neighbors import LocalOutlierFactor


from warnings import filterwarnings
filterwarnings('ignore')

sns.set_style('white')
plt.rcParams['font.family'] = 'monospace'

from scipy.stats import zscore
from scipy.stats import iqr
from scipy import stats
from IPython.display import Image

blues = ['#193f6e','#3b6ba5','#72a5d3','#b1d3e3','#e1ebec']
reds = ['#e61010','#e65010','#e68d10','#e6df10','#c2e610']
cmap_blues = sns.color_palette(blues)
cmap_reds = sns.color_palette(reds)
sns.set_palette(cmap_blues)

print('These are color palette I will use in it:')
sns.palplot(cmap_blues)
sns.palplot(cmap_reds)

2.Data preparation

(Invalid URL)

2.1 Features of data

  • The dataset has 4177 entries and 10 columns:
FeatureData TypeMeasurementDescription
sexcategoricalM, F, and I (Infant)
lengthcontinuousmmlongest shell measurement
diametercontinuousmmperpendicular to the length
heightcontinuousmmmeasured with meat in the shell
whole_wtcontinuousgramswhole abalone weight
shucked_wtcontinuousgramsthe weight of abalone meat
viscera_wtcontinuousgramsgut-weight
shell_wtcontinuousgramsthe weight of the dried shell
ringscontinuousnumber of rings in a shell cross-section
agecontinuousthe age of the abalone: the number of rings + 1.5

(Invalid URL)

2.2 General information

Now we can see all the general information of the dataset. First we will see the first 5 rows of the dataset. We will go through the typology, we will see that there are no duplicate data and that there are no missing values.

Hidden code
Hidden code
Hidden code
print('💠 Are there missing values?\n')
bg_color = '#fbfbfb'
txt_color = '#5c5c5c'
# check for missing values
fig, ax = plt.subplots(tight_layout=True, figsize=(12,6))

fig.patch.set_facecolor(bg_color)
ax.set_facecolor(bg_color)

mv = abalone.isna()
ax = sns.heatmap(data=mv, cmap=cmap_reds, cbar=False, ax=ax, )

ax.set_ylabel('')
ax.set_yticks([])
ax.set_xticklabels(labels=mv.columns, size=12,rotation=45)
ax.tick_params(length=0)

fig.text(
    s=':Missing Values',
    x=0, y=1.1,
    fontsize=17, fontweight='bold',
    color=txt_color,
    va='top', ha='left'
)

fig.text(
    s='''
    we can't see any ...
    ''',
    x=0, y=1.075,
    fontsize=11, fontstyle='italic',
    color=txt_color,
    va='top', ha='left'
)

plt.show()
Hidden code
Hidden code

(Invalid URL)

2.3 Data preprocessing

(Invalid URL)

2.3.1 Data typology and single visualization

(Invalid URL)

2.3.1.1 Categorical data

The only categorical feature is sex. It is divided into three subcategories: male, female and infant. As can be seen, the distributions between the three categories is homogeneous. The noteworthy fact is that the female subcategory has a lower mean than the other two.