Skip to content
Certification - Data Scientist Professional - Recipe Site Traffic
Data Scientist Professional Practical Exam Submission
Use this template to write up your summary for submission. Code in Python or R needs to be included.
📝 Task List
Your written report should include both code, output and written text summaries of the following:
- Data Validation:
- Describe validation and cleaning steps for every column in the data
- Exploratory Analysis:
- Include two different graphics showing single variables only to demonstrate the characteristics of data
- Include at least one graphic showing two or more variables to represent the relationship between features
- Describe your findings
- Model Development
- Include your reasons for selecting the models you use as well as a statement of the problem type
- Code to fit the baseline and comparison models
- Model Evaluation
- Describe the performance of the two models based on an appropriate metric
- Business Metrics
- Define a way to compare your model performance to the business
- Describe how your models perform using this approach
- Final summary including recommendations that the business should undertake
Start writing report here..
# Start coding here...
# importing required librairies
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
%matplotlib inlinedf = pd.read_csv("recipe_site_traffic_2212.csv")df.head(10)df.info()Data Validation
df.isna().sum()# Let's drop missing values, because it looks like there are missing values at random in these four columns
df.dropna(subset=['calories', 'carbohydrate', 'sugar', 'protein'], how='all', inplace=True)
df.isna().sum()As hypothesised, There where missing values in all of these four columns.
df.info()df['category'].unique()df[df['category'] == 'Chicken Breast'].info()df[df['category'] == 'Chicken'].info()df['servings'].unique()