Skip to content
0

Finding the best chocolate bars

Now let's now move on to the competition and challenge.

๐Ÿ“– Background

You work at a specialty foods import company that wants to expand into gourmet chocolate bars. Your boss needs your team to research this market to inform your initial approach to potential suppliers.

After finding valuable chocolate bar ratings online, you need to explore if the chocolate bars with the highest ratings share any characteristics that could help you narrow your search for suppliers (e.g., cacao percentage, bean country of origin, etc.)

๐Ÿ’พ The data

Your team created a file with the following information (source):
  • "id" - id number of the review
  • "manufacturer" - Name of the bar manufacturer
  • "company_location" - Location of the manufacturer
  • "year_reviewed" - From 2006 to 2021
  • "bean_origin" - Country of origin of the cacao beans
  • "bar_name" - Name of the chocolate bar
  • "cocoa_percent" - Cocoa content of the bar (%)
  • "num_ingredients" - Number of ingredients
  • "ingredients" - B (Beans), S (Sugar), S* (Sweetener other than sugar or beet sugar), C (Cocoa Butter), (V) Vanilla, (L) Lecithin, (Sa) Salt
  • "review" - Summary of most memorable characteristics of the chocolate bar
  • "rating" - 1.0-1.9 Unpleasant, 2.0-2.9 Disappointing, 3.0-3.49 Recommended, 3.5-3.9 Highly Recommended, 4.0-5.0 Oustanding

Acknowledgments: Brady Brelinski, Manhattan Chocolate Society

Preparation and Exploratory Analysis

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
df = pd.read_csv('data/chocolate_bars.csv')
df.head()
data=df.drop(labels=['id', 'year_reviewed'], axis=1)
data.describe()
df.head(10)
corr = data.corr()
plt.figure(figsize=(7,7))
sns.heatmap(corr, cmap="Greens",annot=True)
plt.show()
sns.pairplot(data)
plt.show()

1.Countries&Rating

df_origin=df.groupby("bean_origin")["rating"].mean()
df_origin.describe()

df_origin2= df_origin.sort_values()
df_origin

2.Countries&Review

df_review=df.groupby("bean_origin")["review"].agg("count")
df_review.describe()
df_review

3.Average Rating & Total Number of Review Through Country Origin


sns.set(rc={'figure.figsize':(12,6)})
ax=sns.barplot(x=df_origin2.index,y=df_origin2.array)
plt.xticks(rotation=90)
ax2 = ax.twinx()
sns.lineplot(x=df_review.index, y=df_review.array, marker='o', color='crimson', lw=3, ax=ax2)
ax.set_xlabel('Country Origin')
ax.set_ylabel('Average Rating')
ax2.set_ylabel('Total Number of Reviews')
plt.title("Average Rating & Total Number of Review Through Country Origin")
plt.show()
โ€Œ
โ€Œ
โ€Œ