Skip to content

What's in an Avocado Toast: A Supply Chain Analysis

You're in London, making an avocado toast, a quick-to-make dish that has soared in popularity on breakfast menus since the 2010s. A simple smashed avocado toast can be made with five ingredients: one ripe avocado, half a lemon, a big pinch of salt flakes, two slices of sourdough bread and a good drizzle of extra virgin olive oil.

It's no small feat that most of these ingredients are readily available in grocery stores. In this project, you'll conduct a supply chain analysis of the ingredients used in an avocado toast, utilizing the Open Food Facts database. This database contains extensive, openly-sourced information on various foods, including their origins. Through this analysis, you will gain an in-depth understanding of the complex supply chain involved in producing a single dish. The data is contained in .csv files in the data/ folder provided.

After completing this project, you'll be armed with a list of ingredients and their countries of origin, and be well-positioned to launch into other analyses that explore how long, on average, these ingredients spend at sea.

import pandas as pd
avocado = pd.read_csv('data/avocado.csv', delimiter = '\t')
Cols_keep = [ 'code', 'lc', 'product_name_en', 'quantity', 'serving_size', 'packaging_tags', 'brands', 'brands_tags', 'categories_tags', 'labels_tags', 'countries', 'countries_tags', 'origins','origins_tags']
avocado.head()
avocado_sub = avocado[Cols_keep]
cats = ['en:avocadoes', 'en:avocados', 'en:fresh-foods', 'en:fresh-vegetables', 'en:fruchte', 'en:fruits', 'en:raw-green-avocados', 'en:tropical-fruits', 'en:tropische-fruchte', 'en:vegetables-based-foods','fr:hass-avocados']
filtered_avocado = avocado_sub[avocado_sub['categories_tags'].isin(cats)]
# Drop rows with null values in categories_tags
avocado_no_null = filtered_avocado.dropna(subset=['categories_tags'])

# Split categories_tags column into a column of lists
avocado_no_null['categories_list'] = avocado_no_null['categories_tags'].str.split(',')

# Save the new list as categories_list
categories_list = avocado_no_null['categories_list'].tolist()
reference_categories = ['en:avocadoes', 'en:avocados', 'en:fresh-foods', 'en:fresh-vegetables', 'en:fruchte',
                        'en:fruits', 'en:raw-green-avocados', 'en:tropical-fruits', 'en:tropische-fruchte',
                        'en:vegetables-based-foods', 'fr:hass-avocados']

avocado_no_null['contains_reference'] = avocado_no_null['categories_list'].apply(lambda x: any(item in x for item in reference_categories))
# Filter for products recorded in the United Kingdom
avocado_uk = avocado[avocado['countries'] == 'United Kingdom']

# Determine the top country of origin for avocados in the United Kingdom
avocado_origin = avocado_uk['origins_tags'].value_counts().idxmax()

print('**avocado origins**:', '\n', avocado_uk['origins_tags'].value_counts(),  '\n')
avocado_origin = 'Peru'
def read_and_filter_data(filepath, relevant_categories):
    # Read the data from the file
    df = pd.read_csv('data/' + filepath, sep='\t')
    
    # Filter the data based on relevant categories
    df = df[subset_columns]
    df['categories_list'] = df['categories_tags'].str.split(',')
    
    # Drop null categories and filter data
    df = df.dropna(subset = 'categories_list')

    df = df[df['categories_list'].apply(lambda x: any([i for i in x if i in         relevant_categories]))]
    
    df = df[(df['countries']=='United Kingdom')]
    print(f'**{filepath[:-4]}       origins**','\n',df['origins_tags'].value_counts(), '\n')

    return df
    
subset_columns = [ 'code', 'lc', 'product_name_en', 'quantity', 'serving_size', 'packaging_tags', 'brands', 'brands_tags', 'categories_tags', 'labels_tags', 'countries', 'countries_tags', 'origins','origins_tags']
# Identify relevant categories for lemons
relevant_lemon_categories = [
 'en:aromatic-herbs',
 'en:aromatic-plants',
 'en:citron',
 'en:citrus',
 'en:fresh-fruits',
 'en:fresh-lemons',
 'en:fruits',
 'en:lemons',
 'en:unwaxed-lemons'
]