Skip to content
Duplicate of Competition - Abalone Seafood Farming
Can you estimate abalone age?
Image(filename='Diseño sin título(1).png')
1.Introduction
Abalone is a shellfish considered a delicacy in many parts of the world. An excellent source of iron and pantothenic acid, and a nutritious food resource and farming in Australia, America and East Asia. 100 grams of abalone yields more than 20% recommended daily intake of these nutrients. The economic value of abalone is positively correlated with its age. Therefore, to detect the age of abalone accurately is important for both farmers and customers to determine its price. However, the current technology to decide the age is quite costly and inefficient. The age of abalone is determined by cutting the shell through the cone, staining it, and counting the number of rings through a microscope -- a laborious task. Other measurements, which are easier to obtain, are used to predict the age. Further information, such as weather patterns and location (hence food availability) may be required to solve the problem. However, for this problem we shall assume that the abalone's physical measurements are sufficient to provide an accurate age prediction.
Paper objectives:
- How does weight change with age for each of the three sex categories?
- Can you estimate an abalone's age using its physical characteristics?
- Investigate which variables are better predictors of age for abalones.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.lines as lines
from scipy.stats import iqr
from skimage import io
from scipy.stats import skew, kurtosis
pd.set_option("display.max_columns",None)
pd.set_option("display.max_rows",None)
from sklearn.neighbors import LocalOutlierFactor
from warnings import filterwarnings
filterwarnings('ignore')
sns.set_style('white')
plt.rcParams['font.family'] = 'monospace'
from scipy.stats import zscore
from scipy.stats import iqr
from scipy import stats
from IPython.display import Image
blues = ['#193f6e','#3b6ba5','#72a5d3','#b1d3e3','#e1ebec']
reds = ['#e61010','#e65010','#e68d10','#e6df10','#c2e610']
cmap_blues = sns.color_palette(blues)
cmap_reds = sns.color_palette(reds)
sns.set_palette(cmap_blues)
print('These are color palette I will use in it:')
sns.palplot(cmap_blues)
sns.palplot(cmap_reds)
2.Data preparation
(Invalid URL)
2.1 Features of data
- The dataset has 4177 entries and 10 columns:
Feature | Data Type | Measurement | Description |
---|---|---|---|
sex | categorical | M, F, and I (Infant) | |
length | continuous | mm | longest shell measurement |
diameter | continuous | mm | perpendicular to the length |
height | continuous | mm | measured with meat in the shell |
whole_wt | continuous | grams | whole abalone weight |
shucked_wt | continuous | grams | the weight of abalone meat |
viscera_wt | continuous | grams | gut-weight |
shell_wt | continuous | grams | the weight of the dried shell |
rings | continuous | number of rings in a shell cross-section | |
age | continuous | the age of the abalone: the number of rings + 1.5 |
(Invalid URL)
2.2 General information
Now we can see all the general information of the dataset. First we will see the first 5 rows of the dataset. We will go through the typology, we will see that there are no duplicate data and that there are no missing values.
Hidden code
Hidden code
Hidden code
print('💠 Are there missing values?\n')
bg_color = '#fbfbfb'
txt_color = '#5c5c5c'
# check for missing values
fig, ax = plt.subplots(tight_layout=True, figsize=(12,6))
fig.patch.set_facecolor(bg_color)
ax.set_facecolor(bg_color)
mv = abalone.isna()
ax = sns.heatmap(data=mv, cmap=cmap_reds, cbar=False, ax=ax, )
ax.set_ylabel('')
ax.set_yticks([])
ax.set_xticklabels(labels=mv.columns, size=12,rotation=45)
ax.tick_params(length=0)
fig.text(
s=':Missing Values',
x=0, y=1.1,
fontsize=17, fontweight='bold',
color=txt_color,
va='top', ha='left'
)
fig.text(
s='''
we can't see any ...
''',
x=0, y=1.075,
fontsize=11, fontstyle='italic',
color=txt_color,
va='top', ha='left'
)
plt.show()
Hidden code
Hidden code
(Invalid URL)
2.3 Data preprocessing
(Invalid URL)
2.3.1 Data typology and single visualization
(Invalid URL)
2.3.1.1 Categorical data
The only categorical feature is sex. It is divided into three subcategories: male, female and infant. As can be seen, the distributions between the three categories is homogeneous. The noteworthy fact is that the female subcategory has a lower mean than the other two.