🏆Certification - Data Scientist - Recipe Site Traffic (copy)

Data Scientist Professional Practical Exam Submission

Check out my video presentation: https://youtu.be/htqnv6uulPM?si=GpYIN-_kzTkpciTL

📝 Task List

Your written report should include both code, output and written text summaries of the following:

Data Validation:
- Describe validation and cleaning steps for every column in the data
Exploratory Analysis:
- Include two different graphics showing single variables only to demonstrate the characteristics of data
- Include at least one graphic showing two or more variables to represent the relationship between features
- Describe your findings
Model Development
- Include your reasons for selecting the models you use as well as a statement of the problem type
- Code to fit the baseline and comparison models
Model Evaluation
- Describe the performance of the two models based on an appropriate metric
Business Metrics
- Define a way to compare your model performance to the business
  - How should the business monitor what they want to achieve?
- Describe how your models perform using this approach
  - Estimate the initial values (s) for the metric based on the current data
  - Initial accuracy of high-traffic recipes
Final summary including recommendations that the business should undertake

Recipe Site Traffic

# libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
sns.set_style('whitegrid')

from matplotlib.colors import LinearSegmentedColormap
# custom palette
palette = ["#E4E0E1", "#D6C0B3", "#AB886D", "#493628"]
palette_reversed = palette[::-1]
# Create a continuous colormap from the custom palette
cmap = LinearSegmentedColormap.from_list("custom_food_cmap", palette)
print(palette, palette_reversed)

# import dataset
recipe_site_traffic = pd.read_csv('recipe_site_traffic_2212.csv')
recipe_site_traffic.head()

Data Validation

Data Validation Summary

Original dataset is 947 rows, 8 columns. After dropping missing values there’s 895 rows remaining.

recipe is numeric, 947 unique values, no missing values. No cleaning is needed.
calories is numeric, 52 missing values, no negative values. I'll handle missing values
carbohydrate is numeric, 52 missing values, no negative values. I'll handle missing values
sugar is numeric, 52 missing values, no negative values. I'll handle missing values
protein is numeric, 52 missing values, no negative values. I'll handle missing values
category is string, 11 possible values (there’s an extra category), no missing values. I'll convert Chicken Breast category to Chicken.
servings is string, 6 possible values, no missing values. I'll convert to numeric type but use as ordinal categories
high_traffic is string, 373 missing values. I'll convert to boolean, missing values are 'low traffic recipes'.

# validate for column types
recipe_site_traffic.info()

# validate for missing values
recipe_site_traffic.isna().sum()

# validate for duplicate data
recipe_site_traffic.duplicated().sum()

# validate recipe id 947 unique values
recipe_site_traffic['recipe'].nunique()

# validate for negative and extreme values in calories, carbohydrate, sugar, protein
recipe_site_traffic[['calories', 'carbohydrate', 'sugar', 'protein']].describe()

# validate category 10 possible values (food groupings)
categories = ['Lunch/Snacks', 'Beverages', 'Potato', 'Vegetable', 'Meat', 'Chicken', 'Pork', 'Dessert', 'Breakfast', 'One Dish Meal']
print(recipe_site_traffic['category'].nunique())
print(recipe_site_traffic['category'].unique())

# find extra category
set(recipe_site_traffic['category'].unique()) - set(categories)

# validate servings
recipe_site_traffic['servings'].unique()

‌
‌
‌

🏆Certification - Data Scientist - Recipe Site Traffic (copy)

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Data Scientist Professional Practical Exam Submission

📝 Task List

Recipe Site Traffic

Data Validation

Data Scientist Professional Practical Exam Submission