Skip to content
1 hidden cell
Data Analyst Professional Exam
Data Analyst Professional Practical Exam Submission
You can use any tool that you want to do your analysis and create visualizations. Use this template to write up your summary for submission.
You can use any markdown formatting you wish. If you are not familiar with Markdown, read the Markdown Guide before you start.
1. Data Validation
The dataset contains 1500 rows and 8 columns before cleaning and validataion. I have validated all the columns against the criteria in the dataset table:
- booking_id: unique identifier of the booking without missing values, same as the description. No cleaning is needed.
- months_as_member: The number of months as this fitness club member, minimum 1 month, same as the description. No cleaning is needed.
- weight: the member's weight in kg, same as the description. 20 missing values were replace by overall average.
- days_before: numeric values. No missing values. Dtype converted to int.
- day_of_week: day of the week of the class. Fix the wrong week day names and changed dtype to category. No missing values.
- time: time of the day of the class, same as the description. No missing values. Change dtype to category.
- category: category of the fitness class. 13 missing values replace to "unknown". Dtype changed to category.
- attended: same as the description. No cleaning is needed.
After the data validation, the dataset contains 1500 rows and 8 columns without missing values.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_palette('colorblind')
sns.set_theme()
pd.set_option('display.expand_frame_repr', False)df = pd.read_csv('fitness_class_2212.csv')
df.head()df.info()df.describe()Hidden output
df.isna().sum()Hidden output
mean_weight = df['weight'].mean()
df['weight'].fillna(value=mean_weight, inplace=True)
df.isna().sum()Hidden output
df['days_before'] = df['days_before'].str.strip(' days')df['days_before'] = df['days_before'].astype(int)
df.info()df['day_of_week'] = df['day_of_week'].str.replace('Fri.', 'Fri')
df['day_of_week'] = df['day_of_week'].str.replace('Monday', 'Mon')
df['day_of_week'] = df['day_of_week'].str.replace('Wednesday', 'Wed')
df['day_of_week'] = df['day_of_week'].astype('category')1 hidden cell
df['time'] = df['time'].astype('category')df['category'] = df['category'].str.replace('-', 'Unknown')
df['category'] = df['category'].astype('category')df.info()