Skip to content
Identifying Traits of Sports Talent in Malaysian Children Through Motor Performance
0

Identifying Traits of Sports Talent in Malaysian Children Through Motor Performance

Background

Evaluating children's physical abilities is crucial for gaining insight into their growth and development, as well as for recognizing potential talent in sports. One common metric for this assessment is the Motor Performance Index (MPI), which measures different aspects of a child's motor skills.

Objectives

The primary objective of this report is to analyze datasets related to children's motor performance using summary statistics, visualizations, statistical models, and narratives. Specifically, it aims to:

  1. Explore the demographic profile and characteristics of the sample.
  2. Understand the relationship between the four motor skills.
  3. Explain how the children's attributes affect their motor skills.

Data Used

The dataset used in the analysis is a slightly cleaned version of a dataset described in the article entitled "Kids motor performances datasets" from the Data in Brief journal. It consists of a single CSV file, where each row represents a seven year old Malaysian child. The following lists describe its variables:

Four properties of motor skills were recorded.

  • POWER (): Distance of a two-footed standing jump.
  • SPEED (): Time taken to sprint 20m.
  • FLEXIBILITY (): Distance reached forward in a sitting position.
  • COORDINATION (no.): Number of catches of a ball, out of ten.

Attributes of the children are included.

  • STATE: The Malaysian state where the child resides.
  • RESIDENTIAL: Whether the child lives in a rural or urban area.
  • GENDER: The child's gender, Female or Male.
  • AGE: The child's age in years.
  • WEIGHT (): The child's bodyweight in kg.
  • HEIGHT (): The child's height in cm.
  • BMI (): The child's body mass index (weight in kg divided by height in meters squared).
  • CLASS (BMI): Categorization of the BMI: "SEVERE THINNESS", "THINNESS", "NORMAL", "OVERWEIGHT", "OBESITY".

(Full details of these metrics are described in sections 2.2 to 2.5 of the linked article.)

Results & Discussion

Descriptive Analysis

The following information describe the demographic profile and characteristics of the sample composing of 1998 seven-year-old children who are in national primary regional school and participating in Malaysia's physical fitness test (SEGAK).

Numerical Variables
  • As expected, the mean age of the children is around 7, with a standard deviation of 0.05.
  • The mean weight is 22.21 kg, with a standard deviation of 5.41.
  • The mean height is 118.26 cm, with a standard deviation of 5.97.
  • The mean body mass index (BMI) is 15.77 (kg/m2), with a standard deviation of 3.06.
  • The mean distance of a two-footed standing jump is 96.20 cm, with a standard deviation of 17.59.
  • The mean time taken to sprint 20 m is 5.16 sec, with a standard deviation of 0.71.
  • The mean distance reached forward in a sitting position is 26.2615 cm, with a standard deviation of 4.93.
  • Out of ten, the mean number of ball catches is about 4, with a standard deviation of about 3.
  • We can see from the boxplots below that all numerical variables seem to be symmetrically distributed at their median.
## ---------- Descriptive Analysis

## ----- Numerical Variables

# Subset numerical variable columms
stacked_num_vars <- stack(
    motor_performance %>% 
    dplyr::select(all_of(num_vars))
) %>%
    rename(Variable = ind) %>%
	mutate(Type = ifelse(Variable %in% c("AGE","WEIGHT (kg)","HEIGHT (cm)","BMI (kg/m2)"),
                         "Attribute", "Motor skill"))

# Summary statistics for numerical variables
sum_stats <- data.frame(Variable = num_vars) %>%
	bind_cols(as.data.frame(t(motor_performance %>%
                              summarise_at(num_vars, list(mean)) %>%
                              bind_rows(motor_performance %>%
                                        summarise_at(num_vars, list(sd)), motor_performance %>%
                                        summarise_at(num_vars, list(min)),
                                        motor_performance %>%
                                        summarise_at(num_vars, list(median)),
                                        motor_performance %>%
                                        summarise_at(num_vars, list(max)))
                             )) %>%
              rename(Mean = V1,
                     `Std. Dev.` = V2,
                     `Min.` = V3,
                     `Median` = V4,
                     `Max.` = V5))

rownames(sum_stats) <- 1: nrow(sum_stats)

# Boxplots for numerical variables
boxplots <- ggplot(stacked_num_vars, aes(x = Variable, y = values, fill=Type)) +       
	geom_boxplot(width = 0.75) +
	theme(legend.position = "top",  
          legend.justification=0.48,
          legend.key.size = unit(7, 'mm'),
          legend.text = element_text(margin = margin(r = 10, unit = "pt"),
                                     size = 8.5,
                                     color = "#65707C",
                                     family="sans serif"),
          legend.title = element_text(color = "#65707C",
                                   face = "bold",
                                   size = 9,
                                   family="sans serif"),  
          legend.key = element_rect(fill = NA),
          axis.title = element_text(color = "#65707C",
                                    face = "bold",
                                    size = 8.5,
                                    family="sans serif"),
          axis.text = element_text(color = "#65707C",
                                   size = 8,
                                   family="sans serif"),
          axis.line = element_line(colour = "grey",
                                   linewidth = 0.5),
          panel.grid.major = element_line(color = "grey",
                                          linetype="dashed",
                                          linewidth=0.25),
          panel.background = element_blank(),
          panel.border = element_rect(color="grey40",
                                      fill=NA),  
          panel.spacing = unit(2, "lines"),
          plot.title = element_text(color = "#65707C",
                                    hjust = 0.5,
                                    face = "bold",
                                    size= 11,
                                    family = "sans serif")) +
	labs(x = "\nVariable \n(unit)\n", y = "", fill = "Type:  ") +
	ggtitle("\n Fig. 1: Box Plots of the Numerical Attributes and Motor Skills          ") +
	scale_x_discrete(labels=c("AGE",
              "WEIGHT \n(kg)",
              "HEIGHT \n(cm)",
              "BMI \n(kg/m2)",
              "POWER \n(cm)",
              "SPEED \n(sec)",
              "FLEXIBILITY \n(cm)", 
              "COORDINATION \n(no.)")) + 
	scale_y_continuous(expand = c(0.01, 0),
                       limits = c(0, 175), 
                       breaks = seq(0, 175, by = 25)) +
	scale_fill_manual(values = c('#025C70',
                                 '#007E6C'))

# Save ggplot data
dat <- ggplot_build(boxplots)$data[[1]]

# Reformat boxplots' median line
final_boxplots <- boxplots + geom_segment(data=dat, aes(x=xmin, 
                                      xend=xmax,
                                      y=middle-.15,
                                      yend=middle-.15), 
                        color="grey75", 
                        linewidth=0.5,
                        inherit.aes = FALSE)
Categorical Variables
  • The five Malaysian states with the most number of children residing in are:
    ㅤ1. Selangor - 349 (17.5%)
    ㅤ2. Johor - 241 (12.1%)
    ㅤ3. Sabah - 202 (10.1%)
    ㅤ4. Sarawak - 199 (10.0%)
    ㅤ5. Perak - 166 (8.3%)