Skip to content

Welcome to the world of e-commerce, where customer feedback is a goldmine of insights! In this project, you'll dive into the Women's Clothing E-Commerce Reviews dataset, focusing on the 'Review Text' column filled with direct customer opinions.

Your mission is to use text embeddings and Python to analyze these reviews, uncover underlying themes, and understand customer sentiments. This analysis will help improve customer service and product offerings.

The Data

You will be working with a dataset specifically focusing on customer reviews. Below is the data dictionary for the relevant field:

womens_clothing_e-commerce_reviews.csv

ColumnDescription
'Review Text'Textual feedback provided by customers about their shopping experience and product quality.

Armed with access to powerful embedding API services, you will process the reviews, extract meaningful insights, and present your findings.

Let's get started!

Before you start

In order to complete the project, you may wish to use the OpenAI API. You can create a developer account with OpenAI and store your API key as an environment variable. Instructions for these steps are outlined below.

Create a developer account with OpenAI

  1. Go to the API signup page.

  2. Create your account (you'll need to provide your email address and your phone number).

  1. Go to the API keys page.

  2. Create a new secret key.

  1. Take a copy of it. (If you lose it, delete the key and create a new one.)

Add a payment method

OpenAI sometimes provides free credits for the API, but it's not clear if that is worldwide or what the conditions are. You may need to add debit/credit card details.

The API costs $0.002 / 1000 tokens for GPT-3.5-turbo. 1000 tokens is about 750 words. This project should cost less than 1 US cents (but if you rerun tasks, you will be charged every time).

  1. Go to the Payment Methods page.

  2. Click Add payment method.

  1. Fill in your card details.

Add an environmental variable with your OpenAI key

  1. In Workspace, click on "Environment," in the left sidebar.

  2. Click on the plus button next to "Environment variables" to add environment variables.

  3. In the "Name" field, type "OPENAI_API_KEY". In the "Value" field, paste in your secret key.

  1. Click "Create", then you'll see the following pop-up window. Click "Connect," then wait 5-10 seconds for the kernel to restart, or restart it manually in the Run menu.

Load OpenAI API key from environment variables

These variables can be referenced globally throughout the project while keeping their values secret. Good for setting passwords in credentials.

# Initialize your API key
import os
openai_api_key = os.environ["OPENAI_API_KEY"]

Install useful libraries

# Update OpenAI to 1.3
from importlib.metadata import version
try:
    assert version('openai') == '1.3.0'
except:
    !pip install openai==1.3.0
import openai
Hidden output
# Run this cell to install ChromaDB if desired
try:
    assert version('chromadb') == '0.4.17'
except:
    !pip install chromadb==0.4.17
try:
    assert version('pysqlite3') == '0.5.2'
except:
    !pip install pysqlite3-binary==0.5.2
__import__('pysqlite3')
import sys
sys.modules['sqlite3'] = sys.modules.pop('pysqlite3')
import chromadb
Hidden output

Load the dataset

Load data and perform basic data checks to ensure you are using relevant data for the analysis

# Load the reviews dataset. The index will serve as a unique ID for each review.
import pandas as pd
reviews = pd.read_csv("womens_clothing_e-commerce_reviews.csv")

# Convert the 'Review Text' column to string type to handle any missing values (NaNs) and prepare it for embedding.
col = 'Review Text'
reviews[col] = reviews[col].astype(str)

# Extract all review texts into a list for bulk processing by the embedding API.
review_text = reviews[col].tolist()

reviews.head()
import openai
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

Functions