Skip to content

Every day, professionals wade through hundreds of emails, from urgent client requests to promotional offers. It's like trying to find important messages in a digital ocean. But AI can help you stay afloat by automatically sorting emails to highlight what matters most.

You've been asked to build an intelligent email assistant using Llama, to help users automatically classify their incoming emails. Your system will identify which emails need immediate attention, which are regular updates, and which are promotions that can wait or be archived.

The Data

You'll work with a dataset of various email examples, ranging from urgent business communications to promotional offers. Here's a peek at what you'll be working with:

email_categories_data.csv

ColumnDescription
email_idA unique identifier for each email in the dataset.
email_contentThe full email text including subject line and body. Each email follows a format of "Subject" followed by the message content on a new line.
expected_categoryThe correct classification of the email: Priority, Updates, or Promotions. This will be used to validate your model's performance.
# Run the following cells first
# Install necessary packages, then import the model running the cell below
!pip install llama-cpp-python==0.2.82 -q -q -q
Spinner
DataFrameas
df
variable
SELECT *
FROM 'models.csv'
LIMIT 5
# Import required libraries
import pandas as pd
from llama_cpp import Llama
# Load the email dataset
emails_df = pd.read_csv('data/email_categories_data.csv')
# Display the first few rows of our dataset
print("Preview of our email dataset:")

emails_df.head(2)
emails_df.loc[emails_df['expected_category']=='Promotions'].head()
# Set the model path
model_path = "/files-integrations/files/c9696c24-44f3-45f7-8ccd-4b9b046e7e53/tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf"
# Start coding here
# Use as many cells as you need
email_classifier = Llama(model_path=model_path)

def classifier(classification_prompt,text,classification_model =email_classifier):
    cleaned_text = text.replace("\n"," ")
    prompt_format = classification_prompt + " " + f'"{cleaned_text}"'+ '\nEmail4_Classification:'
    # print(prompt_format)
    c = classification_model(prompt =prompt_format,stop=['Email4_Classification:'],temperature=0)
    # print(c["choices"][0]['text'])
    return c['choices'][0]['text'].split("\n")[0].replace(" ","")
prompt = f'''
You are an email classification bot. You do not engage in coversation or give an explanation.  You only classify emails. The only three options Priority, Updates, Promotions. Please add the last classification as a single word response. This must be Priority, Updates, or Promotions.  Return one of these words only. Please complete this list of emails and their classifications.

Continue this list with the classification of Email4. If an email mentions a "sale" or "discount" it is Promotions.

Email1: "There will be a meeting tomorrow at 10AM. Please respond and confirm you will be attending."
Email1_Classification: Priority
Email2: "Monthly Department Updates. Review this month's KPIs and upcoming projects. New policies attached for review."
Email2_Classification: Updates
Email3: "Flash Sale - 24 Hours Only!. Everything must go! Massive discounts on all items. Shop now before it's too late! 50% off!"
EMail3_Classification: Promotions
Email4: ''' 


classifier(classification_prompt=prompt,text =
           emails_df.iloc[7]['email_content'])
emails_df['predicted'] = emails_df['email_content'].apply(lambda x: classifier(classification_prompt=prompt_new,text=x))
emails_df.head()
result1 = emails_df.iloc[0]['predicted']
result2 = emails_df.iloc[1]['predicted']