You’re part of a group

Switch to your group space and start collaborating with your teammates.

You’re part of a group

Switch to your group space and start collaborating with your teammates.

Sign up
Mohamed Amir/

Google Play Store Apps Data


Google Play Store Apps Data

This dataset consists of web scraped data of more than 10,000 Google Play Store apps and 60,000 app reviews. apps_data.csv consists of data about the apps such as category, number of installs, and price. review_data.csv holds reviews of the apps, including the text of the review and sentiment scores. You can join the two tables on the App column.

Not sure where to begin? Scroll to the bottom to find challenges!

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

Data Dictionary


AppcharacterThe application name
CategorycharacterThe category the app belongs to
RatingnumericOverall user rating of the app
ReviewsnumericNumber of user reviews for the app
SizecharacterThe size of the app
InstallscharacterNumber of user installs for the app
TypecharacterEither "Paid" or "Free"
PricecharacterPrice of the app
Content RatingcharacterThe age group the app is targeted at - "Children" / "Mature 21+" / "Adult"
GenrescharacterPossibly multiple genres the app belongs to
Last UpdatedcharacterThe date the app was last updated
Current VercharacterThe current version of the app
Android VercharacterThe Android version needed for this app


AppcharacterThe application name
Translated_ReviewcharacterUser review (translated to English)
SentimentcharacterThe sentiment of the user - Positive/Negative/Neutral
Sentiment_PolaritycharacterThe sentiment polarity score
Sentiment_SubjectivitycharacterThe sentiment subjectivity score

Source of dataset.

Don't know where to start?

Challenges are brief tasks designed to help you practice specific skills:

  • 🗺️ Explore: Which categories get the highest reviews from amongst the 10 most popular categories?
  • 📊 Visualize: Create a plot visualizing the distribution of sentiment polarity, split by content rating.
  • 🔎 Analyze: What impact does the content rating an app receives have on its sentiment and rating?

Scenarios are broader questions to help you develop an end-to-end project for your portfolio:

You are working for an app developer. They are in the process of brainstorming a new app. They want to ensure that their next app scores a high review on the app store, as this can lead to the app being featured on the store homepage. They would like you analyze what factors increase the rating an app will receive. They would also like to know what impact reviews have on the final score.

You will need to prepare a report that is accessible to a broad audience. It should outline your motivation, steps, findings, and conclusions.

✍️ If you have an idea for an interesting Scenario or Challenge, or have feedback on our existing ones, let us know! You can submit feedback by pressing the question mark in the top right corner of the screen and selecting "Give Feedback". Include the phrase "Content Feedback" to help us flag it in our system.

# Read in datasets
apps = pd.read_csv('apps_data.csv')
reviews = pd.read_csv('review_data.csv')
# Since there are duplicates we drop duplicates from apps
apps = apps.drop_duplicates()

print('The total number of apps in the dataset is ', len(apps))
# take a random sample of 5 rows
# we first clean the Installs and Price column
chars_to_remove = ['$', '+', ',']

cols_to_clean = ['Installs', 'Price']

for col in cols_to_clean:
    for char in chars_to_remove:
        apps[col] = apps[col].apply(lambda x: x.replace(char, ''))
## we check the contents of these column entries again for non-numeric characters


Looking at the above we see that there are some rows with words like 'Free' and 'Everyone' which needs to be removed

  • AI Chat
  • Code