Skip to content

Data Scientist Professional Practical Exam Submission

Use this template to write up your summary for submission. Code in Python or R needs to be included.

📝 Task List

Your written report should include both code, output and written text summaries of the following:

  • Data Validation:
    • Describe validation and cleaning steps for every column in the data
  • Exploratory Analysis:
    • Include two different graphics showing single variables only to demonstrate the characteristics of data
    • Include at least one graphic showing two or more variables to represent the relationship between features
    • Describe your findings
  • Model Development
    • Include your reasons for selecting the models you use as well as a statement of the problem type
    • Code to fit the baseline and comparison models
  • Model Evaluation
    • Describe the performance of the two models based on an appropriate metric
  • Business Metrics
    • Define a way to compare your model performance to the business
    • Describe how your models perform using this approach
  • Final summary including recommendations that the business should undertake

Start writing report here..

# Start coding here...
# importing required librairies

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
%matplotlib inline
df = pd.read_csv("recipe_site_traffic_2212.csv")
df.head(10)
df.info()

Data Validation

df.isna().sum()
# Let's drop missing values, because it looks like there are missing values at random in these four columns
df.dropna(subset=['calories', 'carbohydrate', 'sugar', 'protein'], how='all', inplace=True)
df.isna().sum()

As hypothesised, There where missing values in all of these four columns.

df.info()
df['category'].unique()
df[df['category'] == 'Chicken Breast'].info()
df[df['category'] == 'Chicken'].info()
df['servings'].unique()