Skip to content
Recipe Site Traffic Prediction
Recipe Site Traffic
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.metrics import  confusion_matrix, classification_report# Load the data
recipes = pd.read_csv('recipe_site_traffic_2212.csv')
recipes.head()recipes.shapeData Validation
The data set has 947 rows and 8 columns. I have validated all the variables and have made changes after validation. Not all the columns were as described in the data dictionary.
- recipe: 947unique numeric values without missing values, same as the description. No cleaning is needed.
- calories: Numeric values with 52missing values. Missing values imputed with the mean.
- carbohydrate: Numeric values with 52missing values. Missing values imputed with the mean.
- sugar: Numeric values with 52missing values. Missing values imputed with the mean.
- protein: Numeric values with 52missing values. Missing values imputed with the mean.
- category: 11categories without missing values, not same as the description. ReplacedChicken BreastwithChickento make it the10required types of recipes.
- servings: Objectvalues without missing values. Replaced4 as a snackand6 as a snackwith4and6respectively, and changed data type to numerical (int64).
- high_traffic: Character values with 373missing values. Missing values imputed with the valueLow.
# Check variable data types
recipes.info()# Check for missing values
recipes.isnull().sum()recipes['high_traffic'].unique()# Percentage of missing values
print("Missing values for calories: {:.2f}%".format(100 * recipes['calories'].isnull().sum() / len(recipes)))
print("Missing values for high_traffic: {:.2f}%".format(100 * recipes['high_traffic'].isnull().sum() / len(recipes)))# Check for duplicates
recipes.duplicated().sum()# Check for outliers
recipes.describe()recipes['servings'].dtyperecipes['servings'].value_counts()recipes['category'].value_counts()recipes['high_traffic'].unique()