Перейти к основному контенту

Главная R

Курс

Scalable Data Processing in R

Продвинутый уровеньУровень навыков

Обновлено 08.2024

Learn how to write scalable code for working with big data in R using the bigmemory and iotools packages.

Начать курс бесплатно

RProgramming

4 ч

15 видео

49 Упражнений

3,950 XP

6,148

Справка об успешном завершении

Создать бесплатный аккаунт

Продолжить через Google Показать больше вариантов

или

Продолжая, вы принимаете наши Условия использования, Политику конфиденциальности и соглашаетесь с хранением ваших данных в США.

Любимая обучающимися из тысяч компаний

Обучаете команду?

Попробуйте для бизнеса

Описание курса

Datasets are often larger than available RAM, which causes problems for R programmers since by default all the variables are stored in memory. You’ll learn tools for processing, exploring, and analyzing data directly from disk. You’ll also implement the split-apply-combine approach and learn how to write scalable code using the bigmemory and iotools packages. In this course, you'll make use of the Federal Housing Finance Agency's data, a publicly available data set chronicling all mortgages that were held or securitized by both Federal National Mortgage Association (Fannie Mae) and Federal Home Loan Mortgage Corporation (Freddie Mac) from 2009-2015.

Необходимые условия

Writing Efficient R Code

1

Working with increasingly large data sets

In this chapter, we cover the reasons you need to apply new techniques when data sets are larger than available RAM. We show that importing and exporting data using the base R functions can be slow and some easy ways to remedy this. Finally, we introduce the bigmemory package.

What is Scalable Data Processing?

Why is your code slow?

How does processing time vary by data size?

Working with "Out-of-Core" Objects using the Bigmemory Project

Reading a big.matrix object

Attaching a big.matrix object

Creating tables with big.matrix objects

Data summary using bigsummary

References vs. Copies

Copying matrices and big matrices

Начать главу

2

Processing and Analyzing Data with bigmemory

Now that you've got some experience using bigmemory, we're going to go through some simple data exploration and analysis techniques. In particular, we'll see how to create tables and implement the split-apply-combine approach.

The Bigmemory Suite of Packages

Tabulating using bigtable

Borrower Race and Ethnicity by Year (I)

Split-Apply-Combine

Female Proportion Borrowing

Visualize your results using the tidyverse

Visualizing Female Proportion Borrowing

The Borrower Income Ratio

Tidy Big Tables

Limitations of bigmemory

Where should you use bigmemory?

Начать главу

3

Working with iotools

We'll use the iotools package that can process both numeric and string data, and introduce the concept of chunk-wise processing.

Introduction to chunk-wise processing

Can you split-compute-combine it?

Foldable operations (I)

Foldable operations (II)

A first look at iotools: Importing data

Compare read.delim() and read.delim.raw()

Reading raw data and turning it into a data structure

chunk.apply

Reading chunks in as a matrix

Reading chunks in as a data.frame

Parallelizing calls to chunk.apply

Начать главу

4

Case Study: A Preliminary Analysis of the Housing Data

In the previous chapters, we've introduced the housing data and shown how to compute with data that is about as big, or bigger than, the amount of RAM on a single machine. In this chapter, we'll go through a preliminary analysis of the data, comparing various trends over time.

Overview of types of analysis for this chapter

Race and Ethnic Representation in the Mortgage Data

Comparing the Borrower Race/Ethnicity and their Proportions

Are the data missing at random?

Looking for Predictable Missingness

A little more about missingness

Analyzing the Housing Data

Borrower Race and Ethnicity by Year (II)

Visualizing the Adjusted Demographic Trends

Relative change in demographic trend

Borrower Lending Trends: City vs. Rural

Borrower Region by Year

Who is securing federally guaranteed loans?

Congratulations!

Начать главу

Scalable Data Processing in R

Курс
завершён

Получить сертификат об окончании

Добавьте эту квалификацию в профиль LinkedIn, резюме или CV
Поделитесь в социальных сетях и в обзоре эффективностиЗаписаться сейчас

Присоединяйтесь к более чем 19 миллионам обучающихся и начните Scalable Data Processing in R уже сегодня!

Создать бесплатный аккаунт

Продолжить через Google Показать больше вариантов

или

Продолжая, вы принимаете наши Условия использования, Политику конфиденциальности и соглашаетесь с хранением ваших данных в США.

Развивайте свои навыки работы с данными с помощью DataCamp для мобильных устройств.

Успевайте в обучении на ходу с помощью наших мобильных курсов и ежедневных 5-минутных заданий по программированию.