this is the nav!
Duplicate of Competition - Abalone Seafood Farming
• AI Chat
• Code
• Report
• .mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Can you estimate abalone age?

`.mfe-app-workspace-11z5vno{font-family:JetBrainsMonoNL,Menlo,Monaco,'Courier New',monospace;font-size:13px;line-height:20px;}`
``Image(filename='Diseño sin título(1).png')``

1.Introduction

Abalone is a shellfish considered a delicacy in many parts of the world. An excellent source of iron and pantothenic acid, and a nutritious food resource and farming in Australia, America and East Asia. 100 grams of abalone yields more than 20% recommended daily intake of these nutrients. The economic value of abalone is positively correlated with its age. Therefore, to detect the age of abalone accurately is important for both farmers and customers to determine its price. However, the current technology to decide the age is quite costly and inefficient. The age of abalone is determined by cutting the shell through the cone, staining it, and counting the number of rings through a microscope -- a laborious task. Other measurements, which are easier to obtain, are used to predict the age. Further information, such as weather patterns and location (hence food availability) may be required to solve the problem. However, for this problem we shall assume that the abalone's physical measurements are sufficient to provide an accurate age prediction.

Paper objectives:

1. How does weight change with age for each of the three sex categories?
2. Can you estimate an abalone's age using its physical characteristics?
3. Investigate which variables are better predictors of age for abalones.
``````import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.lines as lines
from scipy.stats import iqr
from skimage import io

from scipy.stats import skew, kurtosis
pd.set_option("display.max_columns",None)
pd.set_option("display.max_rows",None)
from sklearn.neighbors import LocalOutlierFactor

from warnings import filterwarnings
filterwarnings('ignore')

sns.set_style('white')
plt.rcParams['font.family'] = 'monospace'

from scipy.stats import zscore
from scipy.stats import iqr
from scipy import stats
from IPython.display import Image

blues = ['#193f6e','#3b6ba5','#72a5d3','#b1d3e3','#e1ebec']
reds = ['#e61010','#e65010','#e68d10','#e6df10','#c2e610']
cmap_blues = sns.color_palette(blues)
cmap_reds = sns.color_palette(reds)
sns.set_palette(cmap_blues)

print('These are color palette I will use in it:')
sns.palplot(cmap_blues)
sns.palplot(cmap_reds)``````

2.Data preparation

2.1 Features of data

• The dataset has 4177 entries and 10 columns:
FeatureData TypeMeasurementDescription
`sex`categoricalM, F, and I (Infant)
`length`continuousmmlongest shell measurement
`diameter`continuousmmperpendicular to the length
`height`continuousmmmeasured with meat in the shell
`whole_wt`continuousgramswhole abalone weight
`shucked_wt`continuousgramsthe weight of abalone meat
`viscera_wt`continuousgramsgut-weight
`shell_wt`continuousgramsthe weight of the dried shell
`rings`continuousnumber of rings in a shell cross-section
`age`continuousthe age of the abalone: the number of rings + 1.5

2.2 General information

Now we can see all the general information of the dataset. First we will see the first 5 rows of the dataset. We will go through the typology, we will see that there are no duplicate data and that there are no missing values.

Hidden code
Hidden code
Hidden code
``````print('💠 Are there missing values?\n')
bg_color = '#fbfbfb'
txt_color = '#5c5c5c'
# check for missing values
fig, ax = plt.subplots(tight_layout=True, figsize=(12,6))

fig.patch.set_facecolor(bg_color)
ax.set_facecolor(bg_color)

mv = abalone.isna()
ax = sns.heatmap(data=mv, cmap=cmap_reds, cbar=False, ax=ax, )

ax.set_ylabel('')
ax.set_yticks([])
ax.set_xticklabels(labels=mv.columns, size=12,rotation=45)
ax.tick_params(length=0)

fig.text(
s=':Missing Values',
x=0, y=1.1,
fontsize=17, fontweight='bold',
color=txt_color,
va='top', ha='left'
)

fig.text(
s='''
we can't see any ...
''',
x=0, y=1.075,
fontsize=11, fontstyle='italic',
color=txt_color,
va='top', ha='left'
)

plt.show()``````
Hidden code
Hidden code

2.3.1 Data typology and single visualization

2.3.1.1 Categorical data

The only categorical feature is sex. It is divided into three subcategories: male, female and infant. As can be seen, the distributions between the three categories is homogeneous. The noteworthy fact is that the female subcategory has a lower mean than the other two.