본문으로 바로가기

강의

R에서 확장 가능한 데이터 처리

고급기술 수준

업데이트됨 2024. 8.

bigmemory와 iotools 패키지를 활용해 R에서 빅데이터를 다루는 확장 가능한 코드를 작성하는 방법을 학습합니다.

무료로 강의 시작

RProgramming

4시간

15 동영상

49 연습 문제

3,950 XP

6,148

성취 증명서

수천 개 기업의 학습자들이 사랑하는

팀을 교육하시나요?

비즈니스용으로 체험해 보세요

강의 설명

데이터셋은 종종 사용 가능한 RAM보다 커서, 기본적으로 모든 변수가 메모리에 저장되는 R 프로그래밍에 문제를 일으킵니다. 이 과정에서는 디스크에서 직접 데이터를 처리, 탐색, 분석하는 도구를 배우게 돼요. 또한 split-apply-combine 접근법을 구현하고, bigmemory와 iotools 패키지를 사용해 확장 가능한 코드를 작성하는 방법을 익힙니다. 이 과정 전반에 걸쳐 2009–2015년 동안 Federal National Mortgage Association(Fannie Mae)과 Federal Home Loan Mortgage Corporation(Freddie Mac)이 보유하거나 유동화한 모든 모기지를 기록한, 공개 데이터셋인 Federal Housing Finance Agency의 데이터를 활용합니다.

선수 조건

Writing Efficient R Code

1

Working with increasingly large data sets

In this chapter, we cover the reasons you need to apply new techniques when data sets are larger than available RAM. We show that importing and exporting data using the base R functions can be slow and some easy ways to remedy this. Finally, we introduce the bigmemory package.

What is Scalable Data Processing?

Why is your code slow?

How does processing time vary by data size?

Working with "Out-of-Core" Objects using the Bigmemory Project

Reading a big.matrix object

Attaching a big.matrix object

Creating tables with big.matrix objects

Data summary using bigsummary

References vs. Copies

Copying matrices and big matrices

2

Processing and Analyzing Data with bigmemory

Now that you've got some experience using bigmemory, we're going to go through some simple data exploration and analysis techniques. In particular, we'll see how to create tables and implement the split-apply-combine approach.

The Bigmemory Suite of Packages

Tabulating using bigtable

Borrower Race and Ethnicity by Year (I)

Split-Apply-Combine

Female Proportion Borrowing

Visualize your results using the tidyverse

Visualizing Female Proportion Borrowing

The Borrower Income Ratio

Tidy Big Tables

Limitations of bigmemory

Where should you use bigmemory?

3

Working with iotools

We'll use the iotools package that can process both numeric and string data, and introduce the concept of chunk-wise processing.

Introduction to chunk-wise processing

Can you split-compute-combine it?

Foldable operations (I)

Foldable operations (II)

A first look at iotools: Importing data

Compare read.delim() and read.delim.raw()

Reading raw data and turning it into a data structure

chunk.apply

Reading chunks in as a matrix

Reading chunks in as a data.frame

Parallelizing calls to chunk.apply

4

Case Study: A Preliminary Analysis of the Housing Data

In the previous chapters, we've introduced the housing data and shown how to compute with data that is about as big, or bigger than, the amount of RAM on a single machine. In this chapter, we'll go through a preliminary analysis of the data, comparing various trends over time.

Overview of types of analysis for this chapter

Race and Ethnic Representation in the Mortgage Data

Comparing the Borrower Race/Ethnicity and their Proportions

Are the data missing at random?

Looking for Predictable Missingness

A little more about missingness

Analyzing the Housing Data

Borrower Race and Ethnicity by Year (II)

Visualizing the Adjusted Demographic Trends

Relative change in demographic trend

Borrower Lending Trends: City vs. Rural

Borrower Region by Year

Who is securing federally guaranteed loans?

Congratulations!

R에서 확장 가능한 데이터 처리

강의
완료

수료증 획득

LinkedIn 프로필, 이력서 또는 CV에 이 인증서를 추가하세요
소셜 미디어와 성과 평가에서 공유하세요지금 등록

19백만 명 이상의 학습자와 함께 R에서 확장 가능한 데이터 처리을(를) 시작하세요!

DataCamp for Mobile을 통해 데이터 분석 능력을 향상시키세요.

모바일 강좌와 매일 5분 코딩 챌린지를 통해 이동 중에도 학습 효과를 높이세요.