Premium Project

The Android App Market on Google Play

Load, clean, and visualize scraped Google Play Store data to understand the Android app market.

Start Project
  • 9 tasks
  • 882 participants
  • 1,500 XP

Project Description

Mobile apps are everywhere. They are easy to create and can be lucrative. Because of these two factors, more and more apps are being developed. In this project, you will do a comprehensive analysis of the Android app market by comparing over ten thousand apps in Google Play across different categories. You'll look for insights in the data to devise strategies to drive growth and retention.

This project lets you apply the skills from Manipulating DataFrames with pandas and Python Data Science Toolbox (Part 1). We recommend that you take those courses before starting this project.

The data for this project was scraped from the Google Play website. While there are many popular datasets for Apple App Store, there aren't many for Google Play apps, which is partially due to the increased difficulty in scraping the latter as compared to the former. The data files are as follows:

  • apps.csv: contains all the details of the applications on Google Play. There are 13 features that describe a given app.
  • user_reviews.csv: contains 100 reviews for each app, most helpful first. The text in each review has been pre-processed and attributed with three new features: Sentiment (Positive, Negative or Neutral), Sentiment Polarity and Sentiment Subjectivity.

Project Tasks

  • 1Google Play Store apps and reviews
  • 2Data cleaning
  • 3Exploring app categories
  • 4Distribution of app ratings
  • 5Size and price of an app
  • 6Relation between app category and app price
  • 7Filter out "junk" apps
  • 8Popularity of paid apps vs free apps
  • 9Sentiment analysis of user reviews
Instructor Avatar
Lavanya Gupta

Machine Learning Engineer at PropTiger.com

Lavanya is a software engineer by profession with research interests in Data Science, Machine Learning and Deep Learning. She has a rich experience in leading data-driven production projects in the industry. She is a passionate programmer in Python, and loves to experiment with new datasets that she scrapes on her own!

See More

Technology

  • Python LogoPython
  • Topics

    Data ManipulationData VisualizationProbability & StatisticsImporting & Cleaning Data