Skip to content
Competition - Abalone Seafood Farming
  • AI Chat
  • Code
  • Report
  • Can you estimate the age of an abalone?

    📖 Background

    You are working as an intern for an abalone farming operation in Japan. For operational and environmental reasons, it is an important consideration to estimate the age of the abalones when they go to market.

    Determining an abalone's age involves counting the number of rings in a cross-section of the shell through a microscope. Since this method is somewhat cumbersome and complex, you are interested in helping the farmers estimate the age of the abalone using its physical characteristics.

    💾 The data

    You have access to the following historical data (source):

    Abalone characteristics:
    • "sex" - M, F, and I (infant).
    • "length" - longest shell measurement.
    • "diameter" - perpendicular to the length.
    • "height" - measured with meat in the shell.
    • "whole_wt" - whole abalone weight.
    • "shucked_wt" - the weight of abalone meat.
    • "viscera_wt" - gut-weight.
    • "shell_wt" - the weight of the dried shell.
    • "rings" - number of rings in a shell cross-section.
    • "age" - the age of the abalone: the number of rings + 1.5.

    Acknowledgments: Warwick J Nash, Tracy L Sellers, Simon R Talbot, Andrew J Cawthorn, and Wes B Ford (1994) "The Population Biology of Abalone (Haliotis species) in Tasmania. I. Blacklip Abalone (H. rubra) from the North Coast and Islands of Bass Strait", Sea Fisheries Division, Technical Report No. 48 (ISSN 1034-3288).

    💪 Competition challenge

    Create a report that covers the following:

    1. How does weight change with age for each of the three sex categories?
    2. Can you estimate an abalone's age using its physical characteristics?
    3. Investigate which variables are better predictors of age for abalones.
    #import relevant libraries
    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sns
    #Load dataset
    raw_data = pd.read_csv('data/abalone.csv')
    #View the first five observations
    #Check out the data
    #Summary of the data


    • The count of the features are the same (4177), which means there are no missing values
    • The numerical data are either float or int and we have only one categorical feature which already had a dtype object, which means all the features are in their right type
    #Creating a checkpoint
    data = raw_data.copy()

    Exploratory data analysis

    #Visualizing the relationship between features
                 vars=['length', 'diameter', 'height', 'whole_wt', 'shucked_wt','viscera_wt', 'shell_wt',                                  'rings', 'age'],
    plt.title('The relationship between the variables')


    • We see clearly, how each pairs of features in the data relate with each other, this is a very important step that helps avoid a state that causes problems in Modelling( the state is multicollinearity).
    • From the pairplot we that some of the data are strongly postive correlated while others are non linear
    • Most of the data are normally distributed and the influence of outliers is not high, except the feature 'Height'
    • Height is rightly skewed, most of the data falls below 0.23.

    Visualising the relationship between weight, age and sex

    #Barchat of sex by weight
    sns.catplot(data=data, x='sex', y='whole_wt', kind='bar', ci=None, order=['I','M','F'])
    #Boxplot of sex distributed by age
    sns.catplot(x='sex', y='age', data=data, kind='box', whis=[0.25, 97.5])
    #Scatter plot of whole_wt against age 
    sns.relplot(x='whole_wt', y='age', data=data, col='sex', col_wrap=2)