Learn SQL Basics for Data Science Specialization
SQL for Data Science Capstone Project (Coursera)
Student: Michele Bedin
Milestone 1: Project Proposal and Data Selection/Preparation
Instructions
You are a data scientist working for a data analytics firm. Your firm has explored a multitude of data sources and is tasked with providing key insights that your clients can make actionable. Your manager has asked you to provide some data analytics guidance for one of the firm’s clients.
In a typical scenario, you would iteratively work with your client to understand the data wanting to be analyzed. Having a solid understanding of the data and any underlying assumptions present is crucial to the success of a data analysis project. However, in this case, you will need to do a little more of the “heavy lifting”.
To begin, you will prepare a project proposal detailing:
- The questions we are wanting to answer,
- initial hypothesis about the data relationships, and
- the approach you will take to get your answers.
NOTE: The proposal is just a plan for how we will travel. It’s there to help keep you on your path by keeping the end goal in mind. You will then will execute your plan and in the end present your findings in a month to your management.
Review criteria
The project proposal you will develop will guide you where you want to go, but may change along the way; and that’s OK! To kick things off you will need to:
- Select your client
- Import your dataset
- Explore and understand your data
- Develop an Entity Relationship Diagram (ERD)
For this milestone, you will upload a PDF version of the two key steps needed in developing your project proposal:
- Preparing for Your Project Proposal
- Develop Your Project Proposal
Step 1: Preparing for Your Proposal
You will document your preparation in developing the project proposal. This includes:
1. Which client/dataset did you select and why?
Client 3: SportsStats (Olympics Dataset - 120 years of data)
SportsStats is a sports analysis firm partnering with local news and elite personal trainers to provide “interesting” insights to help their partners. Insights could be patterns/trends highlighting certain groups/events/countries, etc. for the purpose of developing a news story or discovering key health insights.
I chose this dataset because of the subject matter and also because of the small size of the dataset compared to the other proposed options so as to make the analysis easier given the teaching purpose.
The "Project Proposal" is to understand how women's inclusion has evolved over the years.
2. Describe the steps you took to import and clean the data.
To import the data, I used Python's pandas library, which provides high-performance, easy-to-use data analysis tools. Specifically, I used the read_csv function to read CSV files and convert them to DataFrames, which are two-dimensional, variable-sized, and potentially heterogeneous data structures.
Here are the steps I followed:
I imported the pandas library with the import pandas as pd command.
import pandas as pd