project
Predict Taxi Fares with Random Forests
Use regression trees and random forests to find places where New York taxi drivers earn the most.
Start Project for Free11 Tasks1,500 XP8,691
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.Training 2 or more people?Try DataCamp For Business
Loved by learners at thousands of companies
Project Description
In this project, you get to work with the data from a large number of taxi journeys
in New York from 2013. You will use regression trees and random forests to
predict the value of fares and tips, based on location,
date and time. While not required, it can help to have some extended experience with the packages
dplyr
, ggplot2
and randomForests
.
The dataset used in this project is a sample from the complete 2013 NYC taxi data, which was originally obtained and published by Chris Whong.
Project Tasks
- 149999 New York taxi trips
- 2Cleaning the taxi data
- 3Zooming in on Manhattan
- 4Where does the journey begin?
- 5Predicting taxi fares using a tree
- 6It's time. More predictors.
- 7One more tree!
- 8One tree is not enough
- 9Plotting the predicted fare
- 10Plotting the actual fare
- 11Where do people spend the most?
Technologies
R
Robert Grant
See MoreFounder & Data Sherpa at bayescamp.com
Robert offers training and career coaching for statistics and data scientists, especially around data visualization and Bayesian modeling, using software like R, Stata, and Stan. Before setting up BayesCamp, he was a university statistics lecturer and healthcare researcher. He is a contributor to the open-source Bayesian software, Stan.