Data Analyst: Michele Bedin (www.michelebedin.com)
- Step 1 - Project proposal
- Step 2 - Understand the data
- Step 3 - EDA
- Step 4 - Statistical tests (current)
- Step 5 - Regression modelling
- Step 6 - Machine learning models
- Step 7 - Work delivery
Introduction
You are a data professional in a consulting company called Automatidata. The ongoing project for their new client, the New York City Taxi & Limousine Commission (New York City TLC), is reaching its midpoint, having completed the project proposal (phase 1), Python coding work (phase 2) and exploratory data analysis (phase 3).
You receive a new e-mail from Uli King, the Automatidata project manager. Uli informs your team of a new request from the New York TLC: to analyse the relationship between the fee amount and the payment type. Also discover follow-up e-mails from three other team members: Deshawn Washington, Luana Rodriguez and Udo Bankole. These emails discuss the details of the analysis. A final email from Luana contains your specific assignment: to conduct an A/B test.
Step 3: Statistical tests
In this activity you will practise using statistics to analyse and interpret data. The activity covers fundamental concepts such as descriptive statistics and hypothesis testing. You will explore the data provided and conduct A/B tests and hypothesis tests.
The purpose of this project is to demonstrate your knowledge of how A/B tests are prepared, created and analysed. The results of your A/B tests should be aimed at finding a way to generate more revenue for taxi drivers.
Note: For the purposes of this exercise, let us assume that the sample data comes from an experiment in which customers are randomly selected and divided into two groups: 1) customers who have to pay by credit card, 2) customers who have to pay in cash. Without this assumption, we cannot draw causal conclusions about how the payment method affects the amount of the fee.
The objective is to apply descriptive statistics and hypothesis testing in Python. The objective of this A/B test is to sample the data and analyse whether there is a relationship between the type of payment and the fee amount. For example: to find out whether customers who use credit cards pay higher fees than customers who use cash.
This activity consists of four tasks.
Task 1: Importing and loading data:
- Which data packages will be needed for hypothesis testing?
Task 2 and 3: conducting EDA and hypothesis testing:
- How did the calculation of descriptive statistics help you to analyse the data?
- How did you formulate the null hypothesis and the alternative hypothesis?
Task 4: communicating insights to stakeholders
- What are the main business insights that emerged from your A/B test?
- What business recommendations do you propose based on the results?
Conduct an A/B test
PACE
PACE problem-solving framework: Plan, Analyse, Construct and Execute.
Pace: Plan
At this stage, consider the following questions, if applicable, to complete the answer with the code:
- What is your research question for this data project? Next, you will need to formulate the null and alternative hypotheses as the first step in your hypothesis test. Consider your research question now, at the beginning of this task.
The research question for this data project is: "Is there a relationship between the total amount of the fare and the type of payment?" This question aims to investigate whether the payment method (credit card or cash) affects the total amount of the fare paid by taxi customers. In other words, we are trying to understand whether customers who pay by credit card tend to pay higher fare amounts than those who pay in cash.
Task 1: Importing and loading data
It imports the packages and libraries needed to calculate descriptive statistics and conduct a hypothesis test:
import pandas as pd
from scipy import stats