Skip to main content
HomeR

Course

Scalable Data Processing in R

AdvancedSkill Level
4.6+
22 reviews
Updated 08/2024
Learn how to write scalable code for working with big data in R using the bigmemory and iotools packages.
Start Course for Free
RProgramming4 hr15 videos49 Exercises3,950 XP6,127Statement of Accomplishment

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Loved by learners at thousands of companies

Group

Training 2 or more people?

Try DataCamp for Business

Course Description

Datasets are often larger than available RAM, which causes problems for R programmers since by default all the variables are stored in memory. You’ll learn tools for processing, exploring, and analyzing data directly from disk. You’ll also implement the split-apply-combine approach and learn how to write scalable code using the bigmemory and iotools packages. In this course, you'll make use of the Federal Housing Finance Agency's data, a publicly available data set chronicling all mortgages that were held or securitized by both Federal National Mortgage Association (Fannie Mae) and Federal Home Loan Mortgage Corporation (Freddie Mac) from 2009-2015.

Prerequisites

Writing Efficient R Code
1

Working with increasingly large data sets

In this chapter, we cover the reasons you need to apply new techniques when data sets are larger than available RAM. We show that importing and exporting data using the base R functions can be slow and some easy ways to remedy this. Finally, we introduce the bigmemory package.
Start Chapter
2

Processing and Analyzing Data with bigmemory

3

Working with iotools

4

Case Study: A Preliminary Analysis of the Housing Data

Scalable Data Processing in R
Course
Complete

Earn Statement of Accomplishment

Add this credential to your LinkedIn profile, resume, or CV
Share it on social media and in your performance review
Enroll Now

Don’t just take our word for it

*4.6
from 22 reviews
73%
23%
5%
0%
0%
  • Mohammed
    2 weeks ago

  • Daniel
    4 weeks ago

  • Jonah
    3 months ago

  • Tung
    4 months ago

    .

  • Takuya
    4 months ago

  • Bryan
    6 months ago

Mohammed

Jonah

Bryan

FAQs

What real-world dataset is used throughout this course?

The course uses Federal Housing Finance Agency data covering all mortgages held or securitized by Fannie Mae and Freddie Mac from 2009 to 2015.

Which R packages does this course teach for handling large datasets?

The course focuses on the bigmemory package for working with data larger than RAM and the iotools package for chunk-wise processing of both numeric and string data.

Why can't I just use standard R functions for very large datasets?

R stores all variables in memory by default, so datasets larger than available RAM cannot be handled with base R. This course teaches disk-based and chunk-wise processing as alternatives.

Does the course cover the split-apply-combine approach for big data?

Yes, you will implement split-apply-combine using bigmemory, learning how to explore and analyze large datasets through table creation and grouped computations.

What prior R experience is recommended before taking this course?

You should have completed Introduction to R, Intermediate R, and Writing Efficient R Code. Familiarity with writing efficient code is especially important for this course.

Join over 19 million learners and start Scalable Data Processing in R today!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Grow your data skills with DataCamp for Mobile

Make progress on the go with our mobile courses and daily 5-minute coding challenges.