What makes a good book?
📚 Background
As reading trends keep changing, an ambitious online bookstore has seen a boost in popularity following an intense marketing campaign. Excited to keep the momentum going, the bookstore has kicked off a challenge among their data scientists.
They've equipped the team with a comprehensive dataset featuring book prices, reviews, author details, and categories.
The team is all set to tackle this challenge, aiming to build a model that accurately predicts book popularity, which will help the bookstore manage their stock better and tweak their marketing plans to suit what their readers love the most.
Help them get the best predictions!
You are free to use any methodologies that you like in order to produce your insights.
📊 The Data
They have provided you with a single dataset to use. A summary and preview is provided below.
books.csv
Column | Description |
---|---|
'title' | Book title. |
'price' | Book price. |
'review/helpfulness' | The number of helpful reviews over the total number of reviews. |
'review/summary' | The summary of the review. |
'review/text' | The review's full text. |
'description' | The book's description. |
'authors' | Author. |
'categories' | Book categories. |
'popularity' | Whether the book was popular or unpopular. |
💪 The Challenge
- Use your skills to find the most popular books.
- You can use any predictive model to solve the problem of categorizing books as popular or unpopular.
- Use the accuracy score as your metric to optimize, aiming for at least a 70% accuracy on a test set.
- You may also wish to use feature engineering to pre-process the data.
✍️ Judging criteria
This competition is for helping to understand how competitions work. This competition will not be judged.
✅ Checklist before publishing
- Rename your workspace to make it descriptive of your work. N.B. you should leave the notebook name as notebook.ipynb.
- Remove redundant cells like the judging criteria, so the workbook is focused on your work.
- Check that all the cells run without error.
⌛️ Time is ticking. Good luck!
# Import some required packages
import pandas as pd
# Read in the dataset
books = pd.read_csv("data/books.csv")
# Preview the first five rows
books.head()