Skip to content
Project: What's in an Avocado Toast: A Supply Chain Analysis
  • AI Chat
  • Code
  • Report
  • What's in an Avocado Toast: A Supply Chain Analysis

    You're in London, making an avocado toast, a quick-to-make dish that has soared in popularity on breakfast menus since the 2010s. A simple smashed avocado toast can be made with five ingredients: one ripe avocado, half a lemon, a big pinch of salt flakes, two slices of sourdough bread and a good drizzle of extra virgin olive oil.

    It's no small feat that most of these ingredients are readily available in grocery stores. In this project, you'll conduct a supply chain analysis of the ingredients used in an avocado toast, utilizing the Open Food Facts database. This database contains extensive, openly-sourced information on various foods, including their origins. Through this analysis, you will gain an in-depth understanding of the complex supply chain involved in producing a single dish. The data is contained in .csv files in the data/ folder provided.

    After completing this project, you'll be armed with a list of ingredients and their countries of origin, and be well-positioned to launch into other analyses that explore how long, on average, these ingredients spend at sea.

    ### Task 1. Reading in the avocado data
    
    # Reading tab-delimited data
    import pandas as pd
    avocado = pd.read_csv('data/avocado.csv', sep='\t')
    
    # Subsetting a DataFrame to include only relevant columns
    subset_columns = ['code', 'lc', 'product_name_en', 'quantity', 'serving_size', 'packaging_tags', 'brands', 'brands_tags', 'categories_tags', 'labels_tags', 'countries', 'countries_tags', 'origins','origins_tags']
    avocado = avocado[subset_columns]
    
    ### Task 2. Filter avocado data using relevant category tags
    
    # Dropping rows with null values in a particular column
    avocado = avocado.dropna(subset = 'categories_tags')
    
    # Turning a column of comma separated tags into a column of lists
    avocado['categories_list'] = avocado['categories_tags'].str.split(',')
    
    # Identifying relevant categories for avocados
    relevant_avocado_categories = [
         'en:avocadoes',
         'en:avocados',
         'en:fresh-foods',
         'en:fresh-vegetables',
         'en:fruchte',
         'en:fruits',
         'en:raw-green-avocados',
         'en:tropical-fruits',
         'en:tropische-fruchte',
         'en:vegetables-based-foods',
         'fr:hass-avocados'
    ]
    
    
    # Filtering a DataFrame based on a column of lists
    avocado = avocado[avocado['categories_list'].apply(lambda x: any([i for i in x if i in relevant_avocado_categories]))]
    
    ### Task 3. Where do most avocados come from?
    
    # Filtering your DataFrame by a particular country
    avocados_uk = avocado[(avocado['countries']=='United Kingdom')]
    
    # Returning counts of unique values in a column
    print('**avocado origins**:', '\n', avocados_uk['origins_tags'].value_counts(),  '\n')
    avocado_origin = 'Peru'
    
    ### Task 4. Don't Repeat Yourself (DRY): Create a user-define function instead!
    
    # Create a user-defined function to read and filter data
    def read_and_filter_data(filepath, relevant_categories):
      df = pd.read_csv('data/' + filepath, sep='\t')
    
      # Subset data
      df = df[subset_columns]
    
      # Split tags into lists
      df['categories_list'] = df['categories_tags'].str.split(',')
    
      # Drop null categories and filter data
      df = df.dropna(subset = 'categories_list')
    
      df = df[df['categories_list'].apply(lambda x: any([i for i in x if i in relevant_categories]))]
        
      df = df[(df['countries']=='United Kingdom')]
      print(f'**{filepath[:-4]} origins**','\n',df['origins_tags'].value_counts(), '\n')
    
      return df
    
    # Identify relevant categories for lemons
    relevant_lemon_categories = [
     'en:aromatic-herbs',
     'en:aromatic-plants',
     'en:citron',
     'en:citrus',
     'en:fresh-fruits',
     'en:fresh-lemons',
     'en:fruits',
     'en:lemons',
     'en:unwaxed-lemons'
    ]
    
    # Call your user-defined function on lemon.csv
    lemons = read_and_filter_data('lemon.csv',relevant_lemon_categories)
    lemon_origin = 'South Africa'
    
    ## Task 5. Call your user-defined function on the olive oil, salt and sourdough data
    
    # Call your user-defined function on olive_oil.csv
    
    with open("data/relevant_olive_oil_categories.txt", "r") as file:
        relevant_olive_oil_categories = file.read().splitlines()
        file.close()
    
    olive_oil = read_and_filter_data('olive_oil.csv',relevant_olive_oil_categories)
    olive_oil_origin = 'Greece'
    
    # Call your user-defined function on sourdough.csv
    
    with open("data/relevant_sourdough_categories.txt", "r") as file:
        relevant_sourdough_categories = file.read().splitlines()
        file.close()
        
    sourdough = read_and_filter_data('sourdough.csv',relevant_sourdough_categories)
    sourdough_origin = 'United Kingdom'
    
    relevant_salt_categories = [
     'en:edible-common-salt',
     'en:salts',
     'en:sea-salts']
    
    # Call your user-defined function on salt.csv
    
    salt_flakes = read_and_filter_data('salt_flakes.csv',relevant_salt_categories)