Premium project

Predict Taxi Fares with Random Forests

Use regression trees and random forests to find places where New York taxi drivers earn the most.

Start Project
11 Tasks1,500 XP

Loved by learners at thousands of companies


Project Description

In this project, you get to work with the data from a large number of taxi journeys in New York from 2013. You will use regression trees and random forests to predict the value of fares and tips, based on location, date and time. While not required, it can help to have some extended experience with the packages `dplyr`, `ggplot2` and `randomForests`. The dataset used in this project is a sample from the [complete 2013 NYC taxi data](https://chriswhong.com/open-data/foil_nyc_taxi/), which was originally obtained and published by Chris Whong.

Project Tasks

  1. 1
    49999 New York taxi trips
  2. 2
    Cleaning the taxi data
  3. 3
    Zooming in on Manhattan
  4. 4
    Where does the journey begin?
  5. 5
    Predicting taxi fares using a tree
  6. 6
    It's time. More predictors.
  7. 7
    One more tree!
  8. 8
    One tree is not enough
  9. 9
    Plotting the predicted fare
  10. 10
    Plotting the actual fare
  11. 11
    Where do people spend the most?
Technologies
R R
Topics
Data VisualizationMachine LearningCase Studies
Robert Grant Headshot

Robert Grant

Founder & Data Sherpa at bayescamp.com
Robert offers training and career coaching for statistics and data scientists, especially around data visualization and Bayesian modeling, using software like R, Stata, and Stan. Before setting up BayesCamp, he was a university statistics lecturer and healthcare researcher. He is a contributor to the open-source Bayesian software, Stan.
See More

What do other learners have to say?

I've used other sites—Coursera, Udacity, things like that—but DataCamp's been the one that I've stuck with.

Devon Edwards Joseph
Lloyds Banking Group

DataCamp is the top resource I recommend for learning data science.

Louis Maiden
Harvard Business School

DataCamp is by far my favorite website to learn from.

Ronald Bowers
Decision Science Analytics, USAA