Analysis of Clothing Reviews with Embeddings using OpenAI API

In this project, We will dive into a Women's Clothing Reviews dataset, focusing on the 'Review Text' column filled with direct customer opinions.

Our mission is to use text embeddings and Python to find similarities among these reviews.

Here is the data dictonary:

Column	Description
`'Review Text'`	Textual feedback provided by customers about their shopping experience and product quality.
`'Class Name'`	Categorical variable, the class of clothing to which the review refers

# Initialize the API key
import os
openai_api_key = os.environ["OPENAI_API_KEY"]

Install useful libraries

# Update OpenAI to 1.3
from importlib.metadata import version
try:
    assert version('openai') == '1.3.0'
except:
    !pip install openai==1.3.0
    
import openai
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Hidden output

Load the dataset

Load data and perform basic data checks

# Load the dataset
reviews = pd.read_csv("womens_clothing_e-commerce_reviews.csv")

# Display the first few entries
reviews.head()

# Check for duplicates and missing values
display(reviews.duplicated().sum())

display(reviews.isnull().sum())

# Remove missing values in the "Review Text" columns
reviews.dropna(subset = ["Review Text"], inplace=True)

# Extract the  the "Review Text" and the "Class Name" columns and convert them to a list
reviews_text = reviews["Review Text"].tolist()
class_names = reviews["Class Name"].tolist()

OpenAI connection and Embedding creation

Connect to OpenAI and create a function to create embeddings from a given text

# Connect to OpenAI API
client = openai.OpenAI(api_key = openai_api_key)

def get_embeddings(texts):
    
    # Create the embeddings using the model "text-embedding-ada-002"
    response = client.embeddings.create(
    model="text-embedding-ada-002",
    input=texts
    )
    
    # Convert rsponse to a dictionary
    response_dict = response.model_dump()
    
    # Return the embeddings
    return [data['embedding'] for data in response_dict['data']]

# Get embeddings for the reviews
embeddings = get_embeddings(reviews_text)

Decomposition using T-SNE

Use T-SNE to reduce dimensionality to 2 components

‌
‌
‌

Analysis of Clothing Reviews with Embeddings using OpenAI API

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Analysis of Clothing Reviews with Embeddings using OpenAI API

Install useful libraries

Load the dataset

OpenAI connection and Embedding creation

Decomposition using T-SNE

Analysis of Clothing Reviews with Embeddings using OpenAI API