Competition - Children's Motor Performance

Children's Motor Performance

📖 Background

Measuring the physical abilities of children is helpful for understanding growth and development, as well as identifying gifted individuals by sports talent scouts. A common measure for physical abilities is the Motor Performance Index.

An athletics talent scout has hired you to find insights in a dataset to assist their search for the next generation of track and field stars.

💾 The data

The dataset is a slightly cleaned version of a dataset described in the article Kids motor performances datasets from the Data in Brief journal.

The dataset consists of a single CSV file, data/motor-performance.csv.

Each row represents a seven year old Malaysian child.

Four properties of motor skills were recorded.

POWER (cm): Distance of a two-footed standing jump.
SPEED (sec): Time taken to sprint 20m.
FLEXIBILITY (cm): Distance reached forward in a sitting position.
COORDINATION (no.): Number of catches of a ball, out of ten.

Full details of these metrics are described in sections 2.2 to 2.5 of the linked article.

Attributes of the children are included.

STATE: The Malaysian state where the child resides.
RESIDENTIAL: Whether the child lives in a rural or urban area.
GENDER: The child's gender, Female or Male.
AGE: The child's age in years.
WEIGHT (kg): The child's bodyweight in kg.
HEIGHT (CM): The child's height in cm.
BMI (kg/m2): The child's body mass index (weight in kg divided by height in meters squared).
CLASS (BMI): Categorization of the BMI: "SEVERE THINNESS", "THINNESS", "NORMAL", "OVERWEIGHT", "OBESITY".

import pandas as pd

motor_performance = pd.read_csv("data/motor-performance.csv")
motor_performance

💪 Challenge

Explore the dataset to understand how the attributes of the children affect the motor skills, and the relationship between the four motor skills. Your published notebook should contain a short report on the motor skills, including summary statistics, visualizations, statistical models, and text describing any insights you found.

🧑‍⚖️ Judging criteria

The publications will be graded as follows:

[20%] Technical approach.
- Is the approach technically sound?
- Is the code high quality?
[20%] Visualizations
- Are the visualizations suitable?
- Can clear insights be gleaned from the visualizations?
[30%] Storytelling
- Does the data underpin the narrative?
- Does the narrative read coherently?
- Is the narrative detailed but concise?
[30%] Insights and recommendations
- How clear are the insights and recommendations?
- Are the insights relevant to the domain?
- Are limitations of the analysis recognized?

In the event that multiple submissions have an equally high score, the publication with the most upvotes wins.

📘 Rules

To be eligible to win, you must:

Submit your response before the deadline. All responses must be submitted in English.

Entrants must be:

18+ years old.
Allowed to take part in a skill-based competition from their country. Entrants can not:
Be in a country currently sanctioned by the U.S. government.

✅ Checklist before publishing

Rename your workspace to make it descriptive of your work. N.B. you should leave the notebook name as notebook.ipynb.
Remove redundant cells like the judging criteria, so the workbook is focused on your work.
Check that all the cells run without error.

⌛️ Time is ticking. Good luck!

Created by Mehmet Alper ŞAHİN

Libraries

Exploratory Data Analysis

- Drop Duplicates
- Distribution of Categorical Values
- Dummy Variables for categorical Values
- Outlier Detection & Remove Outliers
- Correlation of Independent Variables
- Conformity to Normal Distribution
    - Shapiro-Wilk Test
    - Kolmogorov-Smirnov Test
- Normal Transform

Multiple Regression

Clustering

- Normalization
- Elbow Method to determine the number of class
- Non-Hierarchical procedures
    - K-means
- Hierarchical procedures
    - Linkage Methods
    - Variance Methods (Ward’s method)

Decision Tree

Random Forest

Conclusion and Recommendations

Libraries:

import pandas as pd           
import numpy as np              
from scipy import stats           
import statsmodels.api as sm    

import seaborn as sns           
import matplotlib.pyplot as plt  
import plotly.express as px   

from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error

from sklearn.cluster import KMeans
from yellowbrick.cluster import KElbowVisualizer  
from sklearn.metrics import silhouette_score
import scipy as sp
from scipy.cluster.hierarchy import linkage, dendrogram

from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, roc_auc_score, precision_score, f1_score, recall_score, confusion_matrix, ConfusionMatrixDisplay


from warnings import filterwarnings
filterwarnings('ignore')

Exploratory Data Analysis:

raw_data = pd.read_csv('data/motor-performance.csv')
raw_data.head(10)

shape_first = raw_data.shape[0]
raw_data.drop_duplicates(inplace = True)
print( shape_first - raw_data.shape[0], ' row is removed from raw data.' )

‌
‌
‌