Skip to content
Project: Analyzing Car Reviews with LLMs
  • AI Chat
  • Code
  • Report
  • Car-ing is sharing, an auto dealership company for car sales and rental, is taking their services to the next level thanks to Large Language Models (LLMs).

    As their newly recruited AI and NLP developer, you've been asked to prototype a chatbot app with multiple functionalities that not only assist customers but also provide support to human agents in the company.

    The solution should receive textual prompts and use a variety of pre-trained Hugging Face LLMs to respond to a series of tasks, e.g. classifying the sentiment in a car’s text review, answering a customer question, summarizing or translating text, etc.

    Before you start

    In order to complete the project you may wish to install some Hugging Face libraries such as transformers and evaluate.

    !pip install transformers
    !pip install evaluate
    from transformers import logging
    Hidden output
     #load the dataset 
    import pandas as pd 
    df = pd.read_csv("data/car_reviews.csv" ,delimiter= ";")
    # Put the car reviews and their associated sentiment labels in two lists
    reviews = df['Review'].tolist()
    real_labels = df['Class'].tolist()

    Sentimental classification

    Use a pre-trained LLM to classify the sentiment of the five car reviews in the car_reviews.csv dataset, and evaluate the classification accuracy and F1 score of predictions.

    from transformers import pipeline 
    classifier = pipeline (task = 'sentiment-analysis' , model='distilbert-base-uncased-finetuned-sst-2-english')
    # Perform inference on the car reviews and display prediction results
    predicted_labels = classifier(reviews)
    for review, prediction, label in zip(reviews, predicted_labels, real_labels):
        print(f"Review: {review}\nActual Sentiment: {label}\nPredicted Sentiment: {prediction['label']} (Confidence: {prediction['score']:.4f})\n")
    # Load accuracy and F1 score metrics    
    import evaluate
    accuracy = evaluate.load("accuracy")
    f1 = evaluate.load("f1")
    # Map categorical sentiment labels into integer labels
    references = [1 if label == "POSITIVE" else 0 for label in real_labels]
    predictions = [1 if label['label'] == "POSITIVE" else 0 for label in predicted_labels]
    # Calculate accuracy and F1 score
    accuracy_result_dict = accuracy.compute(references=references, predictions=predictions)
    accuracy_result = accuracy_result_dict['accuracy']
    f1_result_dict = f1.compute(references=references, predictions=predictions)
    f1_result = f1_result_dict['f1']
    print(f"Accuracy: {accuracy_result}")
    print(f"F1 result: {f1_result}")


    The company is recently attracting customers from Spain. Extract and pass the first two sentences of the first review in the dataset to an English-to-Spanish translation LLM. Calculate the BLEU score to assess translation quality, using the content in reference_translations.txt as references.

    #Load translation LLM into a pipeline and translate the car review
    first_review = reviews[0]
    translator = pipeline (task = 'translation' , model ="Helsinki-NLP/opus-mt-en-es" )
    translated_review = translator(first_review , max_length = 27)[0]['translation_text']
    print(f"Model translation:\n{translated_review}")
    #Load reference translation from file 
    with open("data/reference_translations.txt" , 'r') as file : 
        lines = file.readlines()
    references = [line.strip() for line in lines ]
    print(f"Spanish translation references:\n{references}")
    # Load and calculate BLEU score metric
    bleu = evaluate.load("bleu")
    bleu_score = bleu.compute(predictions=[translated_review], references=[references])