What's in an Avocado Toast: A Supply Chain Analysis
You find yourself in London, crafting a delectable avocado toast, a dish that has risen dramatically in popularity on breakfast menus since the 2010s. This straightforward recipe requires just five ingredients: a ripe avocado, half a lemon, a generous pinch of salt flakes, two slices of sourdough bread, and a good drizzle of extra virgin olive oil. Most of these ingredients are now staples in grocery stores, and as you will find with this project, that is no small feat!
In this project, you'll conduct a supply chain analysis of three ingredients used in avocado toast using the Open Food Facts database. This database contains extensive, openly-sourced information on various foods, including their origins. Through this analysis, you will gain an in-depth understanding of the complex supply chain involved in producing a single dish.
Three pairs of files are provided in the data folder:
- A CSV file for each ingredient, such as
avocado.csv, with data about each food item and countries of origin. - A TXT file for each ingredient, such as
relevant_avocado_categories, containing only the category tags of interest for that food.
Here are some other key points about these files:
- Some of the rows of data in each of the three CSV files do not contain relevant data for your investigation. In each dataset, you will need to filter out rows with irrelevant data, based on values in the
categories_tagscolumn. Examples of categories are fruits, vegetables, and fruit-based oils. Filter the DataFrame to include only rows wherecategories_tagscontains one of the tags in the relevant categories for that ingredient. - Each row of data usually has multiple category tags in the
categories_tagscolumn. There is a column in each CSV file calledorigins_tags, which contains strings for the country of origin of each item.
After completing this project, you'll be armed with a list of ingredients and their countries of origin and be well-positioned to launch into other analyses that explore how long, on average, these ingredients spend at sea.
# Read tab-delimited data
import pandas as pd
avocado = pd.read_csv('data/avocado.csv', sep ='\t')
olive_oil = pd.read_csv('data/olive_oil.csv', sep ='\t')
sourdough = pd.read_csv('data/sourdough.csv', sep ='\t')# Subset large DataFrame to include only relevant columns
columns_data = ['code', 'lc', 'product_name_en', 'quantity', 'serving_size', 'packaging_tags', 'brands', 'brands_tags', 'categories_tags', 'labels_tags', 'countries', 'countries_tags', 'origins', 'origins_tags']
avocado = avocado[columns_data]
olive_oil = olive_oil[columns_data]
sourdough = sourdough[columns_data]# Gather relevant categories data for avocados
with open("data/relevant_avocado_categories.txt", "r") as file:
relevant_avocado_categories = file.read().splitlines()
file.close()
# Gather relevant categories data for olive_oil
with open("data/relevant_olive_oil_categories.txt", "r") as file:
relevant_olive_oil_categories = file.read().splitlines()
file.close()
# Gather relevant categories data for sourdough
with open("data/relevant_sourdough_categories.txt", "r") as file:
relevant_sourdough_categories = file.read().splitlines()
file.close()### Filter avocado data using relevant category tags
# Turn a ### Filter avocado data using relevant category tags
# Turn a column of comma-separated tags into a column of lists
avocado['categories_list'] = avocado['categories_tags'].str.split(',')
# Drop rows with null values in a particular column
avocado = avocado.dropna(subset = 'categories_list')
# Filter a DataFrame based on a column of lists
avocado = avocado[avocado['categories_list'].apply(lambda x: any([i for i in x if i in relevant_avocado_categories]))]
# Where do most avocados come from?
avocado_uk = avocado[(avocado['countries'] == 'United Kingdom')]
# Find most common country for avocado origin
avocado_origin = (avocado_uk['origins_tags'].value_counts().index[0])
top_avocado_origin = avocado_origin.lstrip("en:")
top_avocado_origin = top_avocado_origin.replace('-', ' ')
print(f'**{avocado[:-4]} origins**','\n', top_avocado_origin, '\n')
print("Top origin country: ", top_avocado_origin)
print("\n")### Filter olive_oil data using relevant category tags
# Split tags into lists
olive_oil['categories_list'] = olive_oil['categories_tags'].str.split(',')
# Drops rows with null categories data
olive_oil = olive_oil.dropna(subset = 'categories_list')
# Filter data for relevant categories
olive_oil = olive_oil[olive_oil['categories_list'].apply(lambda x: any([i for i in x if i in relevant_olive_oil_categories]))]
# Filter Data for UK
olive_oil_uk = olive_oil[(olive_oil['countries'] == 'United Kingdom')]
# Find top origin country olive oil with the highest count
top_olive_oil_origin = (olive_oil_uk['origins_tags'].value_counts().index[0])
# Clean up top origin country of olive oil
top_olive_oil_origin = top_olive_oil_origin.lstrip("en:")
top_olive_oil_origin = top_olive_oil_origin.replace(',', ' ')
print(f'**{avocado[:-4]} origins**','\n', top_olive_oil_origin, '\n')
print("Top origin country: ", top_olive_oil_origin)
print("\n")### Filter sourdough data using relevant category tags
# Split tags into lists
sourdough['categories_list'] = sourdough['categories_tags'].str.split(',')
# Drops rows with null categories data
sourdough = sourdough.dropna(subset = 'categories_list')
# Filter data for relevant categories
sourdough = sourdough[sourdough['categories_list'].apply(lambda x: any([i for i in x if i in relevant_sourdough_categories]))]
# Filter Data for UK
sourdough_uk = sourdough[(sourdough['countries'] == 'United Kingdom')]
# Find top origin country olive oil with the highest count
top_sourdough_origin = (sourdough_uk['origins_tags'].value_counts().index[0])
# Clean up top origin country of olive oil
top_sourdough_origin = top_sourdough_origin.lstrip("en:")
top_sourdough_origin = top_sourdough_origin.replace('-', ' ')
print(f'**{sourdough[:-4]} origins**','\n', top_sourdough_origin, '\n')
print("Top origin country: ", top_sourdough_origin)
print("\n")