Skip to content
Competition - Abalone Seafood Farming
0
  • AI Chat
  • Code
  • Report
  • Can you estimate the age of an abalone?

    ๐Ÿ‘ฉ๐Ÿผโ€๐Ÿ’ผ Introduction

    -- Backgrounds You are working as an intern for an abalone farming operation in Japan. For operational and environmental reasons, it is an important consideration to estimate the age of the abalones when they go to market.

    Determining an abalone's age involves counting the number of rings in a cross-section of the shell through a microscope. Since this method is somewhat cumbersome and complex, you are interested in helping the farmers estimate the age of the abalone using its physical characteristics.

    ๐Ÿšง Loading packages

    pip install category_encoders 
    pip install colored
    import pandas as pd
    import numpy as np
    
    from termcolor import colored
    from colored import fore, back, style
    
    import seaborn as sns
    import matplotlib.pyplot as plt
    
    import scipy as sp
    from scipy import stats
    
    from sklearn.model_selection import train_test_split
    from sklearn.model_selection import GridSearchCV
    from sklearn.compose import TransformedTargetRegressor
    
    from sklearn.compose import make_column_selector
    from sklearn.compose import make_column_transformer
    
    from sklearn.pipeline import make_pipeline,Pipeline
    
    from sklearn.preprocessing import PowerTransformer
    from sklearn.preprocessing import OneHotEncoder
    
    import category_encoders as ce
    
    import xgboost as xgb
    from sklearn.linear_model import LinearRegression,Lasso,Ridge
    
    from sklearn.metrics import mean_squared_error, r2_score

    ๐Ÿ’พ Loading data

    abalone = pd.read_csv('./data/abalone.csv')
    abalone.head()
    

    Abalone characteristics:

    • "sex" - M, F, and I (infant).
    • "length" - longest shell measurement.
    • "diameter" - perpendicular to the length.
    • "height" - measured with meat in the shell.
    • "whole_wt" - whole abalone weight.
    • "shucked_wt" - the weight of abalone meat.
    • "viscera_wt" - gut-weight.
    • "shell_wt" - the weight of the dried shell.
    • "rings" - number of rings in a shell cross-section.
    • "age" - the age of the abalone: the number of rings + 1.5.

    Acknowledgments: Warwick J Nash, Tracy L Sellers, Simon R Talbot, Andrew J Cawthorn, and Wes B Ford (1994) "The Population Biology of Abalone (Haliotis species) in Tasmania. I. Blacklip Abalone (H. rubra) from the North Coast and Islands of Bass Strait", Sea Fisheries Division, Technical Report No. 48 (ISSN 1034-3288).

    # Check missing value
    
    abalone.isna().sum()

    ๐Ÿ‘ฉ๐Ÿผโ€๐Ÿซ Constat:

    โžœ Abalone dataset does not have missing value.

    ๐Ÿ„๐Ÿผโ€โ™€๏ธ Let's go to the analysis

    Part A : How does weight change with age for each of the three sex categories?

    ๐Ÿ“– Description

    ๐Ÿ“ There are 3 sex categories that we have to consider with different weights:

    โžœ Sexes : Male,female and infant

    โžœ Weights: Whole,shucked,viscera and shell.

    ๐Ÿ“ What are we trying to find?

    โžœ How does weight change with age for each of the three sex categories?

    ๐Ÿ“ What will be the stages?

    โžœ A graphique visualization : Scatterplot

    โžœ A statistic test to refute or confirm the hypothesis.

    print(colored('                           ๐ŸงฎScatterPlot :  Weight by age and sex category','grey',attrs=['bold']))
    for column_ in abalone.columns[4:8]:
        g = sns.FacetGrid(abalone, col="sex")
        g.map(sns.scatterplot,'age', column_)
        plt.show()

    ๐Ÿ‘ฉ๐Ÿผโ€๐Ÿซ Constat:

    โžœ The weight of abalone seems to increase positively with age and that whatever the sex category.

    ๐Ÿ”Ž How could this be proven?

    โžœ The solution is to use a correlation test.

    โžœ The most well-known is the Pearson correlation test,but the calculation of the p-value relies on the assumption that variable is normally distributed.

    ๐Ÿ‘‰๐ŸผFirst stage:

    โžœ Determine if each variable is normally distributed

    โžœ We can use the shapiro test that can be use until n is inferior at 5000.

    ๐Ÿ‘‰๐ŸผSecond stage:

    โžœ Use a corralation test.


    โžœ The test used depends on the result of shapiro test:

    โ™ข If the variable have a normal distribution we use : Pearson test

    โ™ข If If the variable have not a normal distribution we use : Kendaull test

    โ€Œ
    โ€Œ
    โ€Œ