Project: What Makes a Good Book?

Identifying popular products is incredibly important for e-commerce companies! Popular products generate more revenue and, therefore, play a key role in stock control.

You've been asked to support an online bookstore by building a model to predict whether a book will be popular or not. They've supplied you with an extensive dataset containing information about all books they've sold, including:

price
popularity (target variable)
review/summary
review/text
review/helpfulness
authors
categories

You'll need to build a model that predicts whether a book will be rated as popular or not.

They have high expectations of you, so have set a target of at least 70% accuracy! You are free to use as many features as you like, and will need to engineer new features to achieve this level of performance.

# Import some required packages
import pandas as pd

# Read in the dataset
books = pd.read_csv('data/books.csv')

# Preview the first five rows
books.head()

1 - Perform EDA

# Inspecting a DataFrame
books.info()

# Understanding distributions and frequencies
import matplotlib.pyplot as plt
import seaborn as sns

# Visualize popularity frequencies
sns.countplot(data=books, x='popularity')
plt.show()

# Visualize price distribution
sns.histplot(data=books, x='price')
plt.show()

# Check frequencies
print(books['categories'].value_counts())

print(books['categories'].value_counts().values)

less = books['categories'].value_counts().values

# Find total number of categories with less than 100 counts
less[less < 100].sum()

# Find total number of categories with greater than 100 count
less[less > 100].sum()

books.groupby('categories').agg({'title': 'count'})

# Filter out rare categories to avoid overfitting
books = books.groupby('categories').filter(lambda x: len(x) > 100)

‌
‌
‌

Project: What Makes a Good Book?

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}1 - Perform EDA

1 - Perform EDA