Skip to content
(Python) Project: What's in an Avocado Toast: A Supply Chain Analysis
  • AI Chat
  • Code
  • Report
  • What's in an Avocado Toast: A Supply Chain Analysis

    You're in London, making an avocado toast, a quick-to-make dish that has soared in popularity on breakfast menus since the 2010s. A simple smashed avocado toast can be made with five ingredients: one ripe avocado, half a lemon, a big pinch of salt flakes, two slices of sourdough bread and a good drizzle of extra virgin olive oil. It's no small feat that most of these ingredients are readily available in grocery stores.

    In this project, you'll conduct a supply chain analysis of three of these ingredients used in an avocado toast, utilizing the Open Food Facts database. This database contains extensive, openly-sourced information on various foods, including their origins. Through this analysis, you will gain an in-depth understanding of the complex supply chain involved in producing a single dish.

    Three pairs of files are provided in the data folder:

    • A CSV file for each ingredient, such as avocado.csv, with data about each food item and countries of origin
    • A TXT file for each ingredient, such as relevant_avocado_categories, containing only the category tags of interest for that food.

    Here are some other key points about these files:

    • Some of the rows of data in each of the three CSV files do not contain relevant data for your investigation. In each dataset, you will need to filter out rows with irrelevant data, based on values in the categories_tags column. Examples of categories are, fruits, vegetables, and fruit-based oils. Filter the DataFrame to include only rows where categories_tags contains one of the tags in the relevant categories for that ingredient.
    • Each row of data usually has multiple categories tags in the categories_tags column.
    • There is a column in each CSV file called origins_tags with strings for country of origin of that item.

    After completing this project, you'll be armed with a list of ingredients and their countries of origin, and be well-positioned to launch into other analyses that explore how long, on average, these ingredients spend at sea.

    You will apply your data manipulation and analysis skills on the supply chain of ingredients for making an avocado toast in the U.K. You need to determine this information:

    The name of the most common country(s) of origin for three key ingredients: avocados, olive oil, and sourdough. For the solution, store this most common country of origin for each ingredient as a string, with one string for each country, in the appropriate variable: top_avocado_origin, top_olive_oil_origin, top_sourdough_origin. If there are any hyphens or other letters in the country name data, this needs to be cleaned up so there are only A-Z letters and (maybe) spaces in the name.

    Note: Because the CSV data files are quite large, and have numerous unused columns, you should subset each of the DataFrames to only include these relevant columns: 'code', 'lc', 'product_name_en', 'quantity', 'serving_size', 'packaging_tags', 'brands', 'brands_tags', 'categories_tags', 'labels_tags', 'countries', 'countries_tags', 'origins','origins_tags'.

    After you complete this project, feel free to analyze this food data for other questions you might be interested in exploring!

    1. Read in the avocado data

    Begin by reading the avocado data from CSV file in the data folder - it is actually tab-delimited. This creates quite a large DataFrame, so it's a good idea to subset it to only a smaller number of relevant columns. Then read in the file for relevant category tags for avocados.

    # Read tab-delimited data
    import pandas as pd
    
    avocado = pd.read_csv('data/avocado.csv', sep='\t')
    avocado.head()
    avocado.columns
    # Subset large DataFrame to include only relevant columns
    column = ['code', 'lc', 'product_name_en', 'quantity', 'serving_size', 'packaging_tags', 'brands', 'brands_tags', 'categories_tags', 'labels_tags', 'countries', 'countries_tags', 'origins','origins_tags']
    
    avocado.isin(column)
    avocado = avocado[column]
    avocado
    # Gather relevant categories data for avocados using with statement
    with open('data/relevant_avocado_categories.txt', "r") as file:
        relevant_avocado_categories = file.read().splitlines()
        file.close()
    avocado['categories_tags'].value_counts()

    2. Filter avocado data using relevant category tags

    Each food DataFrame contains a column called categories_tags, which contains the food item category, e.g., fruits, vegetables, fruit-based oils, etc. Start by dropping rows with null values in categories_tags.

    This column is comma-separated, so you'll first need to turn it into a column of lists so that you can treat each item in the list as a separate tag. Filter this reduced DataFrame to contain only the rows where there is a relevant category tag.

    # Turn a column of comma-separated tags into a column of lists
    avocado['categories_list'] = avocado['categories_tags'].str.split(',')
    avocado[['categories_list']]
    # Drop rows with null values in a particular column
    avocado = avocado.dropna(subset='categories_list')
    avocado
    # Filter a DataFrame based on a column of lists
    avocado = avocado[avocado['categories_list'].apply(lambda x: any([i for i in x if i in relevant_avocado_categories]))]
    
    avocado

    3. Where do most UK avocados come from?

    Your avocado DataFrame should contain a column called origins_tags. Create a variable called top_avocado_origin, containing the top country where avocados in the United Kingdom come from.

    # Filter DataFrame for UK data
    avocados_uk =  avocado[avocado['countries'] == 'United Kingdom']
    avocados_uk