Skip to content
0

🍫 Finding the best chocolate bars 🍫

A DataCamp Challenge


Chocolate

📖 Background

You work at a specialty foods import company that wants to expand into gourmet chocolate bars. Your boss needs your team to research this market to inform your initial approach to potential suppliers.

After finding valuable chocolate bar ratings online, you need to explore if the chocolate bars with the highest ratings share any characteristics that could help you narrow your search for suppliers (e.g., cacao percentage, bean country of origin, etc.)

💾 The data

Your team created a file with the following information (source):
  • "id" - id number of the review
  • "manufacturer" - Name of the bar manufacturer
  • "company_location" - Location of the manufacturer
  • "year_reviewed" - From 2006 to 2021
  • "bean_origin" - Country of origin of the cacao beans
  • "bar_name" - Name of the chocolate bar
  • "cocoa_percent" - Cocoa content of the bar (%)
  • "num_ingredients" - Number of ingredients
  • "ingredients" - B (Beans), S (Sugar), S* (Sweetener other than sugar or beet sugar), C (Cocoa Butter), (V) Vanilla, (L) Lecithin, (Sa) Salt
  • "review" - Summary of most memorable characteristics of the chocolate bar
  • "rating" - 1.0-1.9 Unpleasant, 2.0-2.9 Disappointing, 3.0-3.49 Recommended, 3.5-3.9 Highly Recommended, 4.0-5.0 Oustanding

Acknowledgments: Brady Brelinski, Manhattan Chocolate Society

📖 Table of contents

  1. What is the average rating by country of origin? (Invalid URL)
  2. How many bars were reviewed for each of those countries? (Invalid URL)
  3. Create plots to visualize findings for questions 1 and 2. (Invalid URL)
  4. Is the cacao bean's origin an indicator of quality? (Invalid URL)
  5. How does cocoa content relate to rating? What is the average cocoa content for bars with higher ratings (above 3.5)? (Invalid URL)
  6. Compare the average rating of bars with and without lecithin (L in the ingredients). (Invalid URL)
  7. Summarize your findings. (Invalid URL)
# loading library
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

custom_params = {"axes.spines.right": False,
                 "axes.spines.left": False,
                 "axes.spines.top": False}

sns.set_theme(style="white", 
              palette='magma_r', 
              rc=custom_params)

colors=sns.color_palette('magma_r', 6).as_hex()
!/usr/bin/python3 -m pip install -q --upgrade pip
!pip install -q dython
from dython.nominal import associations
df = pd.read_csv('./data/chocolate_bars.csv')
df.head()
df.info()

1. What is the average rating by country of origin? (Invalid URL)


Let's find what is the average rating by country. I'm checking if all country names are homogenous :

countries = df['bean_origin'].unique()
countries.sort()
print('List of unique countries :\n', countries)

We have to change DR Congo to Congo.

df['bean_origin'] = df['bean_origin'].replace('DR Congo', 'Congo')

Now let's calculate the average by country :

df1 = df.groupby('bean_origin')['rating'].agg(['mean', 'count']).sort_values(by='mean', ascending=False)
print("Let's display the top 10 :\n")
df1.head(10)

Conclusion 1 : The best rate is obtained for beans from Tobago, but only with 2 bars rated.

2. How many bars were reviewed for each of those countries? (Invalid URL)


Let's count the number of bar names :