Skip to content
Project: What's in an Avocado Toast: A Supply Chain Analysis
  • AI Chat
  • Code
  • Report
  • What's in an Avocado Toast: A Supply Chain Analysis

    You're in London, making an avocado toast, a quick-to-make dish that has soared in popularity on breakfast menus since the 2010s. A simple smashed avocado toast can be made with five ingredients: one ripe avocado, half a lemon, a big pinch of salt flakes, two slices of sourdough bread and a good drizzle of extra virgin olive oil. It's no small feat that most of these ingredients are readily available in grocery stores.

    In this project, you'll conduct a supply chain analysis of three of these ingredients used in an avocado toast, utilizing the Open Food Facts database. This database contains extensive, openly-sourced information on various foods, including their origins. Through this analysis, you will gain an in-depth understanding of the complex supply chain involved in producing a single dish.

    Three pairs of files are provided in the data folder:

    • A CSV file for each ingredient, such as avocado.csv, with data about each food item and countries of origin
    • A TXT file for each ingredient, such as relevant_avocado_categories, containing only the category tags of interest for that food.

    Here are some other key points about these files:

    • Some of the rows of data in each of the three CSV files do not contain relevant data for your investigation. In each dataset, you will need to filter out rows with irrelevant data, based on values in the categories_tags column. Examples of categories are, fruits, vegetables, and fruit-based oils. Filter the DataFrame to include only rows where categories_tags contains one of the tags in the relevant categories for that ingredient.
    • Each row of data usually has multiple categories tags in the categories_tags column.
    • There is a column in each CSV file called origins_tags with strings for country of origin of that item.

    After completing this project, you'll be armed with a list of ingredients and their countries of origin, and be well-positioned to launch into other analyses that explore how long, on average, these ingredients spend at sea.

    import pandas as pd
    def read_relevant_categories(filepath):
        Reads a file and returns a list of categories.
            filepath (str): The path to the file.
            list: A list of categories.
        with open(filepath, 'r') as file:
            categories =
        return categories
    def filter_categories(x, relevant_categories):
        Filters a list of categories based on the relevant categories.
            x (list): The list of categories to filter.
            relevant_categories (list): The list of relevant categories.
            bool: True if any of the categories in x are in the relevant categories, 
            False otherwise.
        return any([i for i in x if i in relevant_categories])
    relevant_cols = ['code', 'lc', 'product_name_en', 'quantity', 'serving_size', 
                     'packaging_tags', 'brands', 'brands_tags', 'categories_tags', 
                     'labels_tags', 'countries', 'countries_tags', 'origins','origins_tags']
    # Load csv
    avocado = pd.read_csv('data/avocado.csv', header=0, sep='\t')
    olive_oil = pd.read_csv('data/olive_oil.csv', header=0, sep='\t')
    sourdough = pd.read_csv('data/sourdough.csv', header=0, sep='\t')
    # Load the category tags
    relevant_avocado_cat = read_relevant_categories('data/relevant_avocado_categories.txt')
    relevant_olive_oil_cat = read_relevant_categories('data/relevant_olive_oil_categories.txt')
    relevant_sourdough_cat = read_relevant_categories('data/relevant_sourdough_categories.txt')
    # Subset to include only the relevant columns
    avocado_sub  = avocado[relevant_cols]
    olive_oil_sub = olive_oil[relevant_cols]
    sourdough_sub = sourdough[relevant_cols]
    # Drop rows with null categories_tags
    avocado_sub = avocado_sub.dropna(subset='categories_tags')
    olive_oil_sub = olive_oil_sub.dropna(subset='categories_tags')
    sourdough_sub = sourdough_sub.dropna(subset='categories_tags')
    # Convert the categories_tags column into a column of lists
    avocado_sub['categories_tags'] = avocado_sub['categories_tags'].str.split(',')
    olive_oil_sub['categories_tags'] = olive_oil_sub['categories_tags'].str.split(',')
    sourdough_sub['categories_tags'] = sourdough_sub['categories_tags'].str.split(',')
    #  Filter to include only rows where categories_tags contains one of the tags in the relevant categories for that ingredient.
    filtered_avocado = avocado_sub[avocado_sub['categories_tags']\
                                   .apply(lambda x: filter_categories(x, relevant_avocado_cat))]
    filtered_olive_oil = olive_oil_sub[olive_oil_sub['categories_tags']\
                                    .apply(lambda x: filter_categories(x, relevant_olive_oil_cat))]
    filtered_sourdough = sourdough_sub[sourdough_sub['categories_tags']\
                                    .apply(lambda x: filter_categories(x, relevant_sourdough_cat))]
    # Filtered each ingredient for United Kingdon
    uk_avocado = filtered_avocado[filtered_avocado['countries'] == 'United Kingdom']
    uk_olive_oil = filtered_olive_oil[filtered_olive_oil['countries'] == 'United Kingdom']
    uk_sourdough = filtered_sourdough[filtered_sourdough['countries'] == 'United Kingdom']
    # Count unique values in the origin_tags column and get the top value
    uk_avocado_top = uk_avocado['origins_tags'].value_counts().index[0]
    uk_olive_oil_top = uk_olive_oil['origins_tags'].value_counts().index[0]
    uk_sourdough_top = uk_sourdough['origins_tags'].value_counts().index[0]
    # Strip out characters before country name and replace hyphen in country name with a space, if needed
    top_avocado_origin = uk_avocado_top.lstrip('en:').replace('-', ' ')
    top_olive_oil_origin = uk_olive_oil_top.lstrip('en:').replace('-', ' ')
    top_sourdough_origin = uk_sourdough_top.lstrip('en:').replace('-', ' ')