Recipe Site Traffic Prediction

Recipe Site Traffic

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.metrics import  confusion_matrix, classification_report

# Load the data
recipes = pd.read_csv('recipe_site_traffic_2212.csv')
recipes.head()

recipes.shape

Data Validation

The data set has 947 rows and 8 columns. I have validated all the variables and have made changes after validation. Not all the columns were as described in the data dictionary.

recipe: 947 unique numeric values without missing values, same as the description. No cleaning is needed.
calories: Numeric values with 52 missing values. Missing values imputed with the mean.
carbohydrate: Numeric values with 52 missing values. Missing values imputed with the mean.
sugar: Numeric values with 52 missing values. Missing values imputed with the mean.
protein: Numeric values with 52 missing values. Missing values imputed with the mean.
category: 11 categories without missing values, not same as the description. Replaced Chicken Breast with Chicken to make it the 10 required types of recipes.
servings: Object values without missing values. Replaced 4 as a snackand 6 as a snack with 4 and 6 respectively, and changed data type to numerical (int64).
high_traffic: Character values with 373 missing values. Missing values imputed with the value Low.

# Check variable data types
recipes.info()

# Check for missing values
recipes.isnull().sum()

recipes['high_traffic'].unique()

# Percentage of missing values
print("Missing values for calories: {:.2f}%".format(100 * recipes['calories'].isnull().sum() / len(recipes)))
print("Missing values for high_traffic: {:.2f}%".format(100 * recipes['high_traffic'].isnull().sum() / len(recipes)))

# Check for duplicates
recipes.duplicated().sum()

# Check for outliers
recipes.describe()

recipes['servings'].dtype

recipes['servings'].value_counts()

recipes['category'].value_counts()

recipes['high_traffic'].unique()

‌
‌
‌

Recipe Site Traffic Prediction

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Recipe Site Traffic

Data Validation

Recipe Site Traffic