Skip to content
Project: What's in an Avocado Toast: A Supply Chain Analysis
  • AI Chat
  • Code
  • Report
  • What's in an Avocado Toast: A Supply Chain Analysis

    You're in London, making an avocado toast, a quick-to-make dish that has soared in popularity on breakfast menus since the 2010s. A simple smashed avocado toast can be made with five ingredients: one ripe avocado, half a lemon, a big pinch of salt flakes, two slices of sourdough bread and a good drizzle of extra virgin olive oil.

    It's no small feat that most of these ingredients are readily available in grocery stores. In this project, you'll conduct a supply chain analysis of the ingredients used in an avocado toast, utilizing the Open Food Facts database. This database contains extensive, openly-sourced information on various foods, including their origins. Through this analysis, you will gain an in-depth understanding of the complex supply chain involved in producing a single dish. The data is contained in .csv files in the data/ folder provided.

    After completing this project, you'll be armed with a list of ingredients and their countries of origin, and be well-positioned to launch into other analyses that explore how long, on average, these ingredients spend at sea.

    # Setup
    import pandas as pd
    import numpy as np
    from itertools import chain
    relevant_categories = ['en:avocadoes', 'en:avocados', 'en:fresh-foods', 'en:fresh-vegetables', 'en:fruchte', 'en:fruits', 'en:raw-green-avocados', 'en:tropical-fruits', 'en:tropische-fruchte', 'en:vegetables-based-foods','fr:hass-avocados']
    relevant_categories = ['en:aromatic-plants', 'en:citron', 'en:citrus', 'en:fresh-fruits', 'en:fresh-lemons', 'en:fruits', 'en:lemons', 'en:unwaxed-lemons']
    relevant_categories = pd.read_csv("data/relevant_sourdough_categories.txt", header=None)[0].to_list()
    # Read data
    df = pd.read_csv("./data/sourdough.csv", delimiter="\t")
    # Select subset of columns
    df = df[[ 'code', 'lc', 'product_name_en', 'quantity', 'serving_size', 'packaging_tags', 'brands', 'brands_tags', 'categories_tags', 'labels_tags', 'countries', 'countries_tags', 'origins','origins_tags']]
    # Drop null values in categories
    df = df[~pd.isnull(df['categories_tags'])]
    # Split each category tag value into list
    categories_list = df['categories_tags'].str.split(",")
    # Filter for records where a relevant category hits
    df = df[categories_list.apply(lambda x: any([category in x for category in relevant_categories]))]
    # Split origins and countries tags into lists
    countries_list = df['countries'].str.replace(", ", ",").str.split(',').fillna("")
    # Filter for records recorded in the United Kingdom
    df = df[countries_list.apply(lambda x: "United Kingdom" in x or "Royaume-Uni" in x)]
    def read_and_filter_data(filepath, relevant_categories):
        # Read data
        df = pd.read_csv("./data/"+filepath, delimiter="\t")
        # Select subset of columns
        df = df[[ 'code', 'lc', 'product_name_en', 'quantity', 'serving_size', 'packaging_tags', 'brands', 'brands_tags', 'categories_tags', 'labels_tags', 'countries', 'countries_tags', 'origins','origins_tags']]
        # Drop null values in categories
        df = df[~pd.isnull(df['categories_tags'])]
        # Split each category tag value into list
        categories_list = df['categories_tags'].str.split(",")
        # Filter for records where a relevant category hits
        df = df[categories_list.apply(lambda x: any([category in x for category in relevant_categories]))]
        # Split origins and countries tags into lists
        countries_list = df['countries'].str.replace(", ", ",").str.split(',').fillna("")
        # Filter for records recorded in the United Kingdom
        df = df[countries_list.apply(lambda x: "United Kingdom" in x)]
        origins_list = df['origins_tags'].str.replace(", ", ",").str.split(',').dropna()
    
        # Flatten the origins list
        origins = pd.Series(list(chain.from_iterable(origins_list)))
    
        # Count # of occurrences of each origin tag
        df_origins = origins.groupby(origins).size().sort_values(ascending=False).to_frame()
        # Clean origin tag strings
        df_origins.index = df_origins.index.str.replace(r"^..:", "")
        df_origins.index = df_origins.index.str.replace("-", " ")
        return df, df_origins[0].idxmax()
    
    
    avocado, avocado_origin = read_and_filter_data("avocado.csv", ['en:avocadoes', 'en:avocados', 'en:fresh-foods', 'en:fresh-vegetables', 'en:fruchte', 'en:fruits', 'en:raw-green-avocados', 'en:tropical-fruits', 'en:tropische-fruchte', 'en:vegetables-based-foods','fr:hass-avocados'])
    
    lemon, lemon_origin = read_and_filter_data("lemon.csv", ['en:aromatic-plants', 'en:citron', 'en:citrus', 'en:fresh-fruits', 'en:fresh-lemons', 'en:fruits', 'en:lemons', 'en:unwaxed-lemons'])
    
    sourdough, sourdough_origin = read_and_filter_data("sourdough.csv", pd.read_csv("data/relevant_sourdough_categories.txt", header=None)[0].to_list())
    
    olive_oil, olive_oil_origin = read_and_filter_data("olive_oil.csv", pd.read_csv("data/relevant_olive_oil_categories.txt", header=None)[0].to_list())
    
    results = [lemon_origin, sourdough_origin, olive_oil_origin]
    
    #salt, salt_origin = read_and_filter_data("salt_flakes.csv", ['en:edible-common-salt', 'en:salts', 'en:sea-salts'])
    
    print(sourdough_origin)
    print(olive_oil_origin)
    print(lemon_origin)
    print(avocado_origin)