Back to Templates

Google Play Store Apps Data

This dataset (source) consists of two tables: One consists of actual data about applications on the Google Play Store such as category, number of installs, genre, etc. The other table consists of data about reviews on the apps such as the review itself, sentiment, etc. You can either join the two tables on the key, App, or keep them separately to perform your data analysis.

# Load packages
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt
Matplotlib is building the font cache; this may take a moment.

Load your data

The dataset consists of two seperate tables. We will load them separately and join them on the App-key. During your analysis, you can choose which table of the three to use.

# Load data from the googleplaystore.csv file
apps_data = pd.read_csv('apps_data.csv', index_col=0)
apps_data.head()
CategoryRatingReviewsSizeInstallsTypePriceContent RatingGenresLast UpdatedCurrent VerAndroid Ver
App
Photo Editor & Candy Camera & Grid & ScrapBookART_AND_DESIGN4.115919M10,000+Free0EveryoneArt & DesignJanuary 7, 20181.0.04.0.3 and up
Coloring book moanaART_AND_DESIGN3.996714M500,000+Free0EveryoneArt & Design;Pretend PlayJanuary 15, 20182.0.04.0.3 and up
U Launcher Lite – FREE Live Cool Themes, Hide AppsART_AND_DESIGN4.7875108.7M5,000,000+Free0EveryoneArt & DesignAugust 1, 20181.2.44.0.3 and up
Sketch - Draw & PaintART_AND_DESIGN4.521564425M50,000,000+Free0TeenArt & DesignJune 8, 2018Varies with device4.2 and up
Pixel Draw - Number Art Coloring BookART_AND_DESIGN4.39672.8M100,000+Free0EveryoneArt & Design;CreativityJune 20, 20181.14.4 and up
# Load data from the googleplaystore_user_reviews.csv file
review_data = pd.read_csv('review_data.csv', index_col=0)
review_data.head()
Translated_ReviewSentimentSentiment_PolaritySentiment_Subjectivity
App
10 Best Foods for YouI like eat delicious food. That's I'm cooking ...Positive1.000.533333
10 Best Foods for YouThis help eating healthy exercise regular basisPositive0.250.288462
10 Best Foods for YouNaNNaNNaNNaN
10 Best Foods for YouWorks great especially going grocery storePositive0.400.875000
10 Best Foods for YouBest idea usPositive1.000.300000
# Merge the two dataframes
data = pd.merge(apps_data, review_data, on="App")
data.head()
CategoryRatingReviewsSizeInstallsTypePriceContent RatingGenresLast UpdatedCurrent VerAndroid VerTranslated_ReviewSentimentSentiment_PolaritySentiment_Subjectivity
App
Coloring book moanaART_AND_DESIGN3.996714M500,000+Free0EveryoneArt & Design;Pretend PlayJanuary 15, 20182.0.04.0.3 and upA kid's excessive ads. The types ads allowed a...Negative-0.2501.000000
Coloring book moanaART_AND_DESIGN3.996714M500,000+Free0EveryoneArt & Design;Pretend PlayJanuary 15, 20182.0.04.0.3 and upIt bad >:(Negative-0.7250.833333
Coloring book moanaART_AND_DESIGN3.996714M500,000+Free0EveryoneArt & Design;Pretend PlayJanuary 15, 20182.0.04.0.3 and uplikeNeutral0.0000.000000
Coloring book moanaART_AND_DESIGN3.996714M500,000+Free0EveryoneArt & Design;Pretend PlayJanuary 15, 20182.0.04.0.3 and upNaNNaNNaNNaN
Coloring book moanaART_AND_DESIGN3.996714M500,000+Free0EveryoneArt & Design;Pretend PlayJanuary 15, 20182.0.04.0.3 and upI love colors inspyeringPositive0.5000.600000

Understand your variables

# Understand your variables
variables = pd.DataFrame(columns=['Variable','Number of unique values','Values'])

for i, var in enumerate(data.columns):
    variables.loc[i] = [var, data[var].nunique(), data[var].unique().tolist()]
    
variables
VariableNumber of unique valuesValues
0Category33[ART_AND_DESIGN, FAMILY, AUTO_AND_VEHICLES, BE...
1Rating23[3.9, 4.4, 4.3, 4.1, 4.7, 4.5, 4.2, 4.9, 4.6, ...
2Reviews1330[967, 974, 13791, 1518, 194216, 654, 20260, 20...
3Size178[14M, 33M, 37M, 39M, 12M, 25M, 6.1M, 11M, Vari...
4Installs13[500,000+, 1,000,000+, 100,000+, 5,000,000+, 1...
5Type2[Free, Paid]
6Price14[0, $6.99, $1.99, $4.99, $3.99, $2.99, $11.99,...
7Content Rating5[Everyone, Teen, Mature 17+, Everyone 10+, Adu...
8Genres73[Art & Design;Pretend Play, Art & Design, Art ...
9Last Updated299[January 15, 2018, September 20, 2017, August ...
10Current Ver598[2.0.0, 2.9.2, 1.2.3, 2.2.5, 1.1, 1.0.8, 1.03,...
11Android Ver25[4.0.3 and up, 3.0 and up, 2.3 and up, 4.0 and...
12Translated_Review26682[A kid's excessive ads. The types ads allowed ...
13Sentiment3[Negative, Neutral, nan, Positive]
14Sentiment_Polarity5295[-0.25, -0.7249999999999999, 0.0, nan, 0.5, -0...
15Sentiment_Subjectivity4382[1.0, 0.8333333333333333, 0.0, nan, 0.6, 0.9, ...

Answer interesting questions:

Now you get to explore this exciting dataset! Can't think of where to start? Try your hand at these questions:

  • Which genre tends to get a higher rating?
  • Is there a substantial difference between the amount of downloads of free apps and paid apps?
  • Are people more likely to review bad applications or good applications?
  • Do the number of words on average in a review correlate with the overall rating of an application?
# Start coding
Python

Google Play Store Apps Data

Analyze data about applications and their reviews.

Use Template