Skip to main content

Data Science in Sales: Customer Sentiment Analysis

Learn how data science can be used to analyze customer emotions and deliver valuable insights for sales optimization.
Mar 2022  · 9 min read

Artificial Intelligence Concept Illustration

Data science use cases can be related to virtually any industry where a large amount of data is or can be accumulated.  Stores and e-commerce websites are the places where the real customer experience of people attracted by marketing campaigns happens, and where the valuable purchasing data of a particular brand or company is collected. Here people make a final decision on whether they really want to buy a certain product, whether they are interested in buying something else that they didn't plan to before, how much they are ready to pay, if they would return to this store, and what review they would leave about their customer experience. 

Indeed, customer reviews constitute a solid source of data to analyze and understand what can be changed or reinforced in the whole sales process. Analyzing the data in this way can allow reducing costs, enhancing operational efficiency, improving customer experience, uncovering new opportunities, growing the business, and ultimately increasing revenue. Let's take a closer look at how this precious information can be analyzed and modeled using data science algorithms, to obtain hidden insights and capture the overall message from each and every individual customer.

Data Science Use Case in Sales: Analyzing Customer Sentiment 

Customer sentiment analysis is an automated process of identifying customers' emotions when they use the services or products of a certain company. This is usually the unstructured textual data collected from online surveys, social media, support tickets, feedback forms, product reviews, forums, phone calls, emails, and chatbots. In machine learning, customer sentiment analysis is conducted through natural language processing (NLP) that applies statistical and linguistic methods to extract positive, negative, and neutral sentiments directly from the text data. Essentially, it outputs two parameters:

  • Polarity: indicates whether a sentiment is positive or negative.
  • Magnitude: indicates the strength of that sentiment.

Customer sentiment analysis is a key tool for any modern business since it helps obtain actionable insights, spot and fix critical recurring issues that make customers feel unhappy, reinforce the product or service features that lead to customers' positive emotions, and make more efficient data-driven decisions overall. On a more granular level, customer sentiment analysis allows us to:

  • improve customer service and hence customer experiences,
  • increase customer loyalty,
  • reduce churn rate,
  • upgrade products and services timely,
  • optimize marketing campaigns,
  • anticipate new trends and markets,
  • maintain our company’s high reputation,
  • increase profits.

As with any text analysis task, some pitfalls can be encountered while conducting customer sentiment analysis. For example, the NLP algorithm doesn't capture sarcasm in some reviews and categorizes them wrongly. It can also sometimes fail to decipher very specific abbreviations or rarely used slang words.

Preparing the Dataset

Let's explore how customer sentiment analysis works in practice using a dataset of IMDB movie reviews:

import pandas as pd

movies = pd.read_csv('movies.csv', index_col=0).reset_index(drop=True)
print(f'Number of reviews: {movies.shape[0]:,}\n')
print(movies.head())
Number of reviews: 7,501

                                              review  label
0  This short spoof can be found on Elite's Mille...      0
1  A singularly unfunny musical comedy that artif...      0
2  An excellent series, masterfully acted and dir...      1
3  The master of movie spectacle Cecil B. De Mill...      1
4  I was gifted with this movie as it had such a ...      0

We have two columns: one with the text of each review, and another with the estimation of the overall sentiment: positive (1) or negative (0).

Let's calculate the percentage of positive and negative reviews:

round(movies['label'].value_counts()*100/len(movies['label'])).convert_dtypes()
0    50
1    50
Name: label, dtype: Int64

Hence, we have nearly equal proportions of positive and negative reviews.

Applying the BOW Method 

Our next step will be to transform the text data to numeric form since a machine learning model can work only with numeric features. In particular, we're going to create the features counting how many times each word occurs in the respective review. The most basic and straightforward approach for this purpose is called bag-of-words (BOW) which builds a vocabulary of all the words occurring in the document and counts the frequency of each word in each review. As a result, we'll obtain new features, one for each word, with the corresponding frequencies.

Let's apply the BOW method to our dataset:

from sklearn.feature_extraction.text import CountVectorizer

# Creating new features
vect = CountVectorizer(max_features=200)
vect.fit(movies.review)
X_review = vect.transform(movies.review)
X_df = pd.DataFrame(X_review.toarray(), columns=vect.get_feature_names())

# Combining the new features with the label
movies_bow = pd.concat([movies['label'], X_df], axis=1)
print(movies_bow.head())
  label  10  about  acting  action  actors  actually  after  again  all  ...  \
0      0   0      0       0       0       0         0      0      0    0  ...  
1      0   1      0       1       0       1         0      0      0    3  ...  
2      1   0      0       0       0       0         0      1      0    0  ...  
3      1   0      0       0       1       0         0      0      0    0  ...  
4      0   1      0       0       1       0         0      0      0    3  ...  

  will  with  without  work  world  would  years  you  young  your 
0     0     1        0     0      0      1      0    0      0     0 
1     2     7        1     0      0      2      0    3      0     2 
2     0     2        0     0      0      0      0    0      1     0 
3     0     0        0     0      0      0      0    1      1     0 
4     0     2        0     1      0      0      0    0      0     0 

[5 rows x 201 columns]

Above, we applied an optional parameter max_features to consider only the 200 most frequently used words and avoid potential model overfitting.

Using a Supervised Machine Learning Model to Predict Sentiment

Now, we'll use a supervised machine learning model to predict the sentiment. Since we want to estimate if the sentiment from a new review belongs to a positive or negative category based on already labeled reviews, we have to deal once again with a classification problem. Again, let's use  a logistic regression algorithm and measure the model accuracy:

Accuracy score: 0.754

Confusion matrix:
[[37.62772101 13.23856064]
[11.32829853 37.80541981]]

We see that the model labeled 11% of all the reviews that were positive as negative, and 13% as positive even though they were negative. As possible ways forward to improve the model accuracy, we may consider excluding stopwords (i.e., the low-informative words that occur too frequently, such as "about", "will", "you", etc.) and increasing the size of the vocabulary.

When we apply the BOW method, we might end up having hundreds or even thousands of new features in our dataframe. This can result in creating an excessively complex model: overfitted, with too many unnecessary features and parameters. One way to fix it is to use regularization, which restricts the function of the model. The parameter to tune here is C, representing the strength of regularization. Let's test 2 values of this parameter: 100 and 0.1, and see which one gives us the best model performance on the test data:

lr_1 = LogisticRegression(C=100)
lr_1.fit(X_train, y_train)
predictions_1 = lr_1.predict(X_test)

lr_2 = LogisticRegression(C=0.1)
lr_2.fit(X_train, y_train)
predictions_2 = lr_2.predict(X_test)

print(f'Accuracy score, lr_1 model: {round(accuracy_score(y_test, predictions_1), 3)}\n'
      f'Accuracy score, lr_2 model: {round(accuracy_score(y_test, predictions_2), 3)}\n\n'
      f'Confusion matrix for lr_1 model, %:\n{confusion_matrix(y_test, predictions_1)/len(y_test)*100}\n\n'
      f'Confusion matrix for lr_2 model, %:\n{confusion_matrix(y_test, predictions_2)/len(y_test)*100}')
Accuracy score, lr_1 model: 0.753
Accuracy score, lr_2 model: 0.756

Confusion matrix for lr_1 model, %:
[[37.53887161 13.32741004]
[11.32829853 37.80541981]]

Confusion matrix for lr_2 model, %:
[[37.67214571 13.19413594]
[11.19502443 37.93869391]]

The difference in the model accuracy when using the selected values of the parameter C is insignificant. We could probably find a parameter that improves the model performance more by experimenting further with more values of this parameter. However, here we have only 200 new features, so our model is not that complex and the regularization step is not really needed in our case.

Instead of predicting labels 0 or 1 using the predict function, it's possible to predict a probability of sentiment using predict_proba. Here we have to keep in mind that we cannot directly apply the accuracy score or confusion matrix to the predicted probabilities, since these metrics work only with classes. Hence, we need to encode them further as classes. By default, the probability higher or equal to 0.5 is translated to class 1, otherwise to class 0.

lr = LogisticRegression()
lr.fit(X_train, y_train)

# Predicting the probability of the 0 class
predictions_prob_0 = lr.predict_proba(X_test)[:, 0]

# Predicting the probability of the 1 class
predictions_prob_1 = lr.predict_proba(X_test)[:, 1]

print(f'First 10 predicted probabilities of class 0: {predictions_prob_0[:10].round(3)}\n'
      f'First 10 predicted probabilities of class 1: {predictions_prob_1[:10].round(3)}')
First 10 predicted probabilities of class 0: [0.246 0.143 0.123 0.708 0.001 0.828 0.204 0.531 0.121 0.515]
First 10 predicted probabilities of class 1: [0.754 0.857 0.877 0.292 0.999 0.172 0.796 0.469 0.879 0.485]

Conclusion 

There are many other helpful approaches that we can apply to our dataset to conduct a more granular sentiment analysis:

  • using n-grams (combinations of words) instead of just single words to preserve the context,
  • excluding stopwords,
  • limiting the size of vocabulary based on the upper or lower frequency values,
  • creating numeric extra features describing the length of each review or the number of punctuation marks (the latter can sometimes correlate with the magnitude of the sentiment),
  • excluding numbers, some characters, words of a certain length, or considering more complex word patterns,
  • applying stemming and lemmatization, i.e. reducing the words to their roots,
  • using more sophisticated approaches rather than BOW for creating a vocabulary, e.g., TfIdf (term frequency inverse document frequency) that accounts for how frequently a word occurs in a review with respect to the rest of the reviews.
  • using some specialized libraries designed for sentiment analysis, such as TextBlob, SentiWordNet, VADER (Valence Aware Dictionary and Sentiment Reasoner).

If you're interested in exploring these and other useful techniques for conducting insightful sentiment analysis, check out the course Sentiment Analysis in Python.

← Back to Blogs