Back to Templates
Google Play Store Apps Data
This dataset (source) consists of two tables: One consists of actual data about applications on the Google Play Store such as category, number of installs, genre, etc. The other table consists of data about reviews on the apps such as the review itself, sentiment, etc. You can either join the two tables on the key, App, or keep them separately to perform your data analysis.
# Load packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
Matplotlib is building the font cache; this may take a moment.
Load your data
The dataset consists of two seperate tables. We will load them separately and join them on the App-key. During your analysis, you can choose which table of the three to use.
# Load data from the googleplaystore.csv file
apps_data = pd.read_csv('apps_data.csv', index_col=0)
apps_data.head()
Category | Rating | Reviews | Size | Installs | Type | Price | Content Rating | Genres | Last Updated | Current Ver | Android Ver | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
App | ||||||||||||
Photo Editor & Candy Camera & Grid & ScrapBook | ART_AND_DESIGN | 4.1 | 159 | 19M | 10,000+ | Free | 0 | Everyone | Art & Design | January 7, 2018 | 1.0.0 | 4.0.3 and up |
Coloring book moana | ART_AND_DESIGN | 3.9 | 967 | 14M | 500,000+ | Free | 0 | Everyone | Art & Design;Pretend Play | January 15, 2018 | 2.0.0 | 4.0.3 and up |
U Launcher Lite – FREE Live Cool Themes, Hide Apps | ART_AND_DESIGN | 4.7 | 87510 | 8.7M | 5,000,000+ | Free | 0 | Everyone | Art & Design | August 1, 2018 | 1.2.4 | 4.0.3 and up |
Sketch - Draw & Paint | ART_AND_DESIGN | 4.5 | 215644 | 25M | 50,000,000+ | Free | 0 | Teen | Art & Design | June 8, 2018 | Varies with device | 4.2 and up |
Pixel Draw - Number Art Coloring Book | ART_AND_DESIGN | 4.3 | 967 | 2.8M | 100,000+ | Free | 0 | Everyone | Art & Design;Creativity | June 20, 2018 | 1.1 | 4.4 and up |
# Load data from the googleplaystore_user_reviews.csv file
review_data = pd.read_csv('review_data.csv', index_col=0)
review_data.head()
Translated_Review | Sentiment | Sentiment_Polarity | Sentiment_Subjectivity | |
---|---|---|---|---|
App | ||||
10 Best Foods for You | I like eat delicious food. That's I'm cooking ... | Positive | 1.00 | 0.533333 |
10 Best Foods for You | This help eating healthy exercise regular basis | Positive | 0.25 | 0.288462 |
10 Best Foods for You | NaN | NaN | NaN | NaN |
10 Best Foods for You | Works great especially going grocery store | Positive | 0.40 | 0.875000 |
10 Best Foods for You | Best idea us | Positive | 1.00 | 0.300000 |
# Merge the two dataframes
data = pd.merge(apps_data, review_data, on="App")
data.head()
Category | Rating | Reviews | Size | Installs | Type | Price | Content Rating | Genres | Last Updated | Current Ver | Android Ver | Translated_Review | Sentiment | Sentiment_Polarity | Sentiment_Subjectivity | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
App | ||||||||||||||||
Coloring book moana | ART_AND_DESIGN | 3.9 | 967 | 14M | 500,000+ | Free | 0 | Everyone | Art & Design;Pretend Play | January 15, 2018 | 2.0.0 | 4.0.3 and up | A kid's excessive ads. The types ads allowed a... | Negative | -0.250 | 1.000000 |
Coloring book moana | ART_AND_DESIGN | 3.9 | 967 | 14M | 500,000+ | Free | 0 | Everyone | Art & Design;Pretend Play | January 15, 2018 | 2.0.0 | 4.0.3 and up | It bad >:( | Negative | -0.725 | 0.833333 |
Coloring book moana | ART_AND_DESIGN | 3.9 | 967 | 14M | 500,000+ | Free | 0 | Everyone | Art & Design;Pretend Play | January 15, 2018 | 2.0.0 | 4.0.3 and up | like | Neutral | 0.000 | 0.000000 |
Coloring book moana | ART_AND_DESIGN | 3.9 | 967 | 14M | 500,000+ | Free | 0 | Everyone | Art & Design;Pretend Play | January 15, 2018 | 2.0.0 | 4.0.3 and up | NaN | NaN | NaN | NaN |
Coloring book moana | ART_AND_DESIGN | 3.9 | 967 | 14M | 500,000+ | Free | 0 | Everyone | Art & Design;Pretend Play | January 15, 2018 | 2.0.0 | 4.0.3 and up | I love colors inspyering | Positive | 0.500 | 0.600000 |
Understand your variables
# Understand your variables
variables = pd.DataFrame(columns=['Variable','Number of unique values','Values'])
for i, var in enumerate(data.columns):
variables.loc[i] = [var, data[var].nunique(), data[var].unique().tolist()]
variables
Variable | Number of unique values | Values | |
---|---|---|---|
0 | Category | 33 | [ART_AND_DESIGN, FAMILY, AUTO_AND_VEHICLES, BE... |
1 | Rating | 23 | [3.9, 4.4, 4.3, 4.1, 4.7, 4.5, 4.2, 4.9, 4.6, ... |
2 | Reviews | 1330 | [967, 974, 13791, 1518, 194216, 654, 20260, 20... |
3 | Size | 178 | [14M, 33M, 37M, 39M, 12M, 25M, 6.1M, 11M, Vari... |
4 | Installs | 13 | [500,000+, 1,000,000+, 100,000+, 5,000,000+, 1... |
5 | Type | 2 | [Free, Paid] |
6 | Price | 14 | [0, $6.99, $1.99, $4.99, $3.99, $2.99, $11.99,... |
7 | Content Rating | 5 | [Everyone, Teen, Mature 17+, Everyone 10+, Adu... |
8 | Genres | 73 | [Art & Design;Pretend Play, Art & Design, Art ... |
9 | Last Updated | 299 | [January 15, 2018, September 20, 2017, August ... |
10 | Current Ver | 598 | [2.0.0, 2.9.2, 1.2.3, 2.2.5, 1.1, 1.0.8, 1.03,... |
11 | Android Ver | 25 | [4.0.3 and up, 3.0 and up, 2.3 and up, 4.0 and... |
12 | Translated_Review | 26682 | [A kid's excessive ads. The types ads allowed ... |
13 | Sentiment | 3 | [Negative, Neutral, nan, Positive] |
14 | Sentiment_Polarity | 5295 | [-0.25, -0.7249999999999999, 0.0, nan, 0.5, -0... |
15 | Sentiment_Subjectivity | 4382 | [1.0, 0.8333333333333333, 0.0, nan, 0.6, 0.9, ... |
Answer interesting questions:
Now you get to explore this exciting dataset! Can't think of where to start? Try your hand at these questions:
- Which genre tends to get a higher rating?
- Is there a substantial difference between the amount of downloads of free apps and paid apps?
- Are people more likely to review bad applications or good applications?
- Do the number of words on average in a review correlate with the overall rating of an application?
# Start coding
Google Play Store Apps Data
Analyze data about applications and their reviews.