Skip to content
Finding the best chocolate bars
Finding the best chocolate bars
Now let's now move on to the competition and challenge.
๐ Background
You work at a specialty foods import company that wants to expand into gourmet chocolate bars. Your boss needs your team to research this market to inform your initial approach to potential suppliers.
After finding valuable chocolate bar ratings online, you need to explore if the chocolate bars with the highest ratings share any characteristics that could help you narrow your search for suppliers (e.g., cacao percentage, bean country of origin, etc.)
๐พ The data
Your team created a file with the following information (source):
- "id" - id number of the review
- "manufacturer" - Name of the bar manufacturer
- "company_location" - Location of the manufacturer
- "year_reviewed" - From 2006 to 2021
- "bean_origin" - Country of origin of the cacao beans
- "bar_name" - Name of the chocolate bar
- "cocoa_percent" - Cocoa content of the bar (%)
- "num_ingredients" - Number of ingredients
- "ingredients" - B (Beans), S (Sugar), S* (Sweetener other than sugar or beet sugar), C (Cocoa Butter), (V) Vanilla, (L) Lecithin, (Sa) Salt
- "review" - Summary of most memorable characteristics of the chocolate bar
- "rating" - 1.0-1.9 Unpleasant, 2.0-2.9 Disappointing, 3.0-3.49 Recommended, 3.5-3.9 Highly Recommended, 4.0-5.0 Oustanding
Acknowledgments: Brady Brelinski, Manhattan Chocolate Society
Preparation and Exploratory Analysis
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
df = pd.read_csv('data/chocolate_bars.csv')
df.head()
data=df.drop(labels=['id', 'year_reviewed'], axis=1)
data.describe()df.head(10)corr = data.corr()
plt.figure(figsize=(7,7))
sns.heatmap(corr, cmap="Greens",annot=True)
plt.show()sns.pairplot(data)
plt.show()1.Countries&Rating
df_origin=df.groupby("bean_origin")["rating"].mean()
df_origin.describe()
df_origin2= df_origin.sort_values()
df_origin2.Countries&Review
df_review=df.groupby("bean_origin")["review"].agg("count")
df_review.describe()df_review3.Average Rating & Total Number of Review Through Country Origin
sns.set(rc={'figure.figsize':(12,6)})
ax=sns.barplot(x=df_origin2.index,y=df_origin2.array)
plt.xticks(rotation=90)
ax2 = ax.twinx()
sns.lineplot(x=df_review.index, y=df_review.array, marker='o', color='crimson', lw=3, ax=ax2)
ax.set_xlabel('Country Origin')
ax.set_ylabel('Average Rating')
ax2.set_ylabel('Total Number of Reviews')
plt.title("Average Rating & Total Number of Review Through Country Origin")
plt.show()โ
โ
โ
โ
โ