メインコンテンツへスキップ

コース

Rで学ぶスケーラブルなデータ処理

上級スキルレベル

更新日 2024/08

Rでbigmemoryとiotoolsを使い、ビッグデータを扱うスケーラブルなコードの書き方を学びます。

コースを無料で開始

RProgramming

4時間

15 ビデオ

49 演習

3,950 XP

6,148

修了証明書

何千もの企業の従業員が支持

チームのトレーニングを担当していますか？

Businessをお試しください

コース説明

データセットは、利用可能なRAMより大きいことがよくあります。Rでは既定で変数がすべてメモリ上に保存されるため、これはプログラマーにとって課題になります。本コースでは、ディスク上のデータを直接処理・探索・分析するためのツールを学びます。さらに、split-apply-combine アプローチを実装し、bigmemory と iotools パッケージを使ってスケールするコードの書き方を身につけます。コース全体を通して、Federal Housing Finance Agency のデータ（Fannie Mae と Freddie Mac によって保有・証券化された、2009〜2015年のすべての住宅ローンを記録した公開データ）を活用します。

前提条件

Writing Efficient R Code

1

Working with increasingly large data sets

In this chapter, we cover the reasons you need to apply new techniques when data sets are larger than available RAM. We show that importing and exporting data using the base R functions can be slow and some easy ways to remedy this. Finally, we introduce the bigmemory package.

What is Scalable Data Processing?

Why is your code slow?

How does processing time vary by data size?

Working with "Out-of-Core" Objects using the Bigmemory Project

Reading a big.matrix object

Attaching a big.matrix object

Creating tables with big.matrix objects

Data summary using bigsummary

References vs. Copies

Copying matrices and big matrices

チャプターを開始

2

Processing and Analyzing Data with bigmemory

Now that you've got some experience using bigmemory, we're going to go through some simple data exploration and analysis techniques. In particular, we'll see how to create tables and implement the split-apply-combine approach.

The Bigmemory Suite of Packages

Tabulating using bigtable

Borrower Race and Ethnicity by Year (I)

Split-Apply-Combine

Female Proportion Borrowing

Visualize your results using the tidyverse

Visualizing Female Proportion Borrowing

The Borrower Income Ratio

Tidy Big Tables

Limitations of bigmemory

Where should you use bigmemory?

チャプターを開始

3

Working with iotools

We'll use the iotools package that can process both numeric and string data, and introduce the concept of chunk-wise processing.

Introduction to chunk-wise processing

Can you split-compute-combine it?

Foldable operations (I)

Foldable operations (II)

A first look at iotools: Importing data

Compare read.delim() and read.delim.raw()

Reading raw data and turning it into a data structure

chunk.apply

Reading chunks in as a matrix

Reading chunks in as a data.frame

Parallelizing calls to chunk.apply

チャプターを開始

4

Case Study: A Preliminary Analysis of the Housing Data

In the previous chapters, we've introduced the housing data and shown how to compute with data that is about as big, or bigger than, the amount of RAM on a single machine. In this chapter, we'll go through a preliminary analysis of the data, comparing various trends over time.

Overview of types of analysis for this chapter

Race and Ethnic Representation in the Mortgage Data

Comparing the Borrower Race/Ethnicity and their Proportions

Are the data missing at random?

Looking for Predictable Missingness

A little more about missingness

Analyzing the Housing Data

Borrower Race and Ethnicity by Year (II)

Visualizing the Adjusted Demographic Trends

Relative change in demographic trend

Borrower Lending Trends: City vs. Rural

Borrower Region by Year

Who is securing federally guaranteed loans?

Congratulations!

チャプターを開始

Rで学ぶスケーラブルなデータ処理

コース完了

修了証明書を取得

この修了書をLinkedInや履歴書、CVに追加しましょう
ソーシャルメディアや人事評価で共有しましょう今すぐ登録

19百万人を超える学習者と共にRで学ぶスケーラブルなデータ処理を始めましょう！

DataCamp for Mobileでデータスキルを磨きましょう

モバイルコースと毎日の 5 分間のコーディングチャレンジで、外出先でも進歩できます。