Skip to content
0

Children's Motor Performance

📖 Background

Measuring the physical abilities of children is helpful for understanding growth and development, as well as identifying gifted individuals by sports talent scouts. A common measure for physical abilities is the Motor Performance Index.

An athletics talent scout has hired you to find insights in a dataset to assist their search for the next generation of track and field stars.

💾 The data

The dataset is a slightly cleaned version of a dataset described in the article Kids motor performances datasets from the Data in Brief journal.

The dataset consists of a single CSV file, data/motor-performance.csv.

Each row represents a seven year old Malaysian child.

Four properties of motor skills were recorded.

  • POWER (cm): Distance of a two-footed standing jump.
  • SPEED (sec): Time taken to sprint 20m.
  • FLEXIBILITY (cm): Distance reached forward in a sitting position.
  • COORDINATION (no.): Number of catches of a ball, out of ten.

Full details of these metrics are described in sections 2.2 to 2.5 of the linked article.

Attributes of the children are included.

  • STATE: The Malaysian state where the child resides.
  • RESIDENTIAL: Whether the child lives in a rural or urban area.
  • GENDER: The child's gender, Female or Male.
  • AGE: The child's age in years.
  • WEIGHT (kg): The child's bodyweight in kg.
  • HEIGHT (CM): The child's height in cm.
  • BMI (kg/m2): The child's body mass index (weight in kg divided by height in meters squared).
  • CLASS (BMI): Categorization of the BMI: "SEVERE THINNESS", "THINNESS", "NORMAL", "OVERWEIGHT", "OBESITY".
import pandas as pd

motor_performance = pd.read_csv("data/motor-performance.csv")
motor_performance

💪 Challenge

Explore the dataset to understand how the attributes of the children affect the motor skills, and the relationship between the four motor skills. Your published notebook should contain a short report on the motor skills, including summary statistics, visualizations, statistical models, and text describing any insights you found.

🧑‍⚖️ Judging criteria

The publications will be graded as follows:

  • [20%] Technical approach.
    • Is the approach technically sound?
    • Is the code high quality?
  • [20%] Visualizations
    • Are the visualizations suitable?
    • Can clear insights be gleaned from the visualizations?
  • [30%] Storytelling
    • Does the data underpin the narrative?
    • Does the narrative read coherently?
    • Is the narrative detailed but concise?
  • [30%] Insights and recommendations
    • How clear are the insights and recommendations?
    • Are the insights relevant to the domain?
    • Are limitations of the analysis recognized?

In the event that multiple submissions have an equally high score, the publication with the most upvotes wins.

📘 Rules

To be eligible to win, you must:

  • Submit your response before the deadline. All responses must be submitted in English.

Entrants must be:

  • 18+ years old.
  • Allowed to take part in a skill-based competition from their country. Entrants can not:
  • Be in a country currently sanctioned by the U.S. government.

✅ Checklist before publishing

  • Rename your workspace to make it descriptive of your work. N.B. you should leave the notebook name as notebook.ipynb.
  • Remove redundant cells like the judging criteria, so the workbook is focused on your work.
  • Check that all the cells run without error.

⌛️ Time is ticking. Good luck!

Created by Mehmet Alper ŞAHİN

Contents

Libraries
Exploratory Data Analysis
- Drop Duplicates - Distribution of Categorical Values - Dummy Variables for categorical Values - Outlier Detection & Remove Outliers - Correlation of Independent Variables - Conformity to Normal Distribution - Shapiro-Wilk Test - Kolmogorov-Smirnov Test - Normal Transform
Multiple Regression
Clustering
- Normalization - Elbow Method to determine the number of class - Non-Hierarchical procedures - K-means - Hierarchical procedures - Linkage Methods - Variance Methods (Ward’s method)
Decision Tree
Random Forest
Conclusion and Recommendations

Libraries:

import pandas as pd           
import numpy as np              
from scipy import stats           
import statsmodels.api as sm    

import seaborn as sns           
import matplotlib.pyplot as plt  
import plotly.express as px   

from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error

from sklearn.cluster import KMeans
from yellowbrick.cluster import KElbowVisualizer  
from sklearn.metrics import silhouette_score
import scipy as sp
from scipy.cluster.hierarchy import linkage, dendrogram

from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, roc_auc_score, precision_score, f1_score, recall_score, confusion_matrix, ConfusionMatrixDisplay


from warnings import filterwarnings
filterwarnings('ignore')

Exploratory Data Analysis:

raw_data = pd.read_csv('data/motor-performance.csv')
raw_data.head(10)
shape_first = raw_data.shape[0]
raw_data.drop_duplicates(inplace = True)
print( shape_first - raw_data.shape[0], ' row is removed from raw data.' )