Could we estimate the age of an abalone?
We are working for an abalone farming operation in Japan. For operational and environmental reasons, it is an important consideration to estimate the age of the abalones when they go to market. Determining an abalone's age involves counting the number of rings in a cross-section of the shell through a microscope. Since this method is somewhat cumbersome and complex, the goal of this project is to estimate the age of the abalone using its physical characteristics.
With this aim in mind, we need to answer the following questions:
- How does weight change with age for each of the three sex categories?
- Can you estimate an abalone's age using its physical characteristics?
- Which variables are better predictors of age for abalones.
Set Environment
The first step is to load all the libraries needed to run the functions to be used throughout the project. In order to perform this task, we use pacman, a library specialised in library management.
suppressPackageStartupMessages(install.packages("pacman"))After installing pacman, p_load is used to install and load the others libraries.
suppressPackageStartupMessages(library("pacman"))
suppressWarnings(p_load(tidyverse, ggstatsplot, broom, questionr, cowplot, ggcorrplot, DT, janitor, inspectdf, psych, gameofthrones, performance, caret, xgboost))We also install gameofthrones package via GitHub to make your colour palettes available.
devtools::install_github("aljrico/gameofthrones")
suppressPackageStartupMessages(library(gameofthrones))A pseudo-random seed is created to ensure reproducibility of results.
RNGkind(sample.kind = "Rounding")
set.seed(12345678)A common theme for the graphics will be established.
theme_set(theme_light() +
          theme(text = element_text(family = "Calibri")))Import Data
To complete this job we have access to the following historical data from UCI. In order to import this data, we use read_csv.
abalone <- read_csv('data/abalone.csv', show_col_types = FALSE)
abaloneAs it can be seen, there are 4177 records and the following features:
- sex - M, F, and I (infant).
- length - longest shell measurement.
- diameter - perpendicular to the length.
- height - measured with meat in the shell.
- whole_wt - whole abalone weight.
- shucked_wt - the weight of abalone meat.
- viscera_wt - gut-weight.
- shell_wt - the weight of the dried shell.
- rings - number of rings in a shell cross-section.
- age - the age of the abalone, calculated as the number of rings + 1.5.
As the idea of the project is not to use the number of rings to determine age and as this variable is part of the target variable, the number of rings will not be used in this study.