In this project, you get to work with the data from a large number of taxi journeys in New York from 2013. You will use regression trees and random forests to predict the value of fares and tips, based on location, date and time.
Before taking on this project, we recommend that you have completed Introduction to the Tidyverse.
While not required, it can also help to have some extended experience with the packages
randomForests which you can get in the following courses:
Data Manipulation with dplyr,
Introduction to Data Visualization with ggplot2,
and Supervised Learning In R: Regression.
The dataset used in this project is a sample from the complete 2013 NYC taxi data, which was originally obtained and published by Chris Whong.
Founder & Data Sherpa at bayescamp.com
Robert offers training and career coaching for statistics and data scientists, especially around data visualization and Bayesian modeling, using software like R, Stata, and Stan. Before setting up BayesCamp, he was a university statistics lecturer and healthcare researcher. He is a contributor to the open-source Bayesian software, Stan.See More