Przejdź do głównej treści

Strona główna R

Kurs

Skalowalne przetwarzanie danych w R

ZaawansowanyPoziom umiejętności

Zaktualizowano 08.2024

Naucz się pisać skalowalny kod do pracy z big data w R, używając pakietów bigmemory i iotools.

Zacznij kurs za darmo

RProgramming

4 godz.

15 filmów

49 Ćwiczeń

3,950 XP

6,148

Zaświadczenie o ukończeniu

Uwielbiany przez kursantów z tysięcy firm

Szkolisz zespół?

Wypróbuj dla firm

Opis kursu

Zbiory danych często przekraczają dostępną pamięć RAM, co stanowi problem dla programistów R, ponieważ domyślnie wszystkie zmienne są przechowywane w pamięci. Nauczysz się narzędzi do przetwarzania, eksploracji i analizowania danych bezpośrednio z dysku. Poznasz też podejście split-apply-combine i nauczysz się pisać skalowalny kod z wykorzystaniem pakietów bigmemory i iotools. W kursie korzystamy z danych Federalnej Agencji Finansowania Mieszkalnictwa (FHFA) – publicznie dostępnego zbioru danych obejmującego wszystkie kredyty hipoteczne obsługiwane lub sekurytyzowane przez Federal National Mortgage Association (Fannie Mae) oraz Federal Home Loan Mortgage Corporation (Freddie Mac) w latach 2009–2015.

Wymagania wstępne

Writing Efficient R Code

1

Working with increasingly large data sets

In this chapter, we cover the reasons you need to apply new techniques when data sets are larger than available RAM. We show that importing and exporting data using the base R functions can be slow and some easy ways to remedy this. Finally, we introduce the bigmemory package.

What is Scalable Data Processing?

Why is your code slow?

How does processing time vary by data size?

Working with "Out-of-Core" Objects using the Bigmemory Project

Reading a big.matrix object

Attaching a big.matrix object

Creating tables with big.matrix objects

Data summary using bigsummary

References vs. Copies

Copying matrices and big matrices

Zacznij rozdział

2

Processing and Analyzing Data with bigmemory

Now that you've got some experience using bigmemory, we're going to go through some simple data exploration and analysis techniques. In particular, we'll see how to create tables and implement the split-apply-combine approach.

The Bigmemory Suite of Packages

Tabulating using bigtable

Borrower Race and Ethnicity by Year (I)

Split-Apply-Combine

Female Proportion Borrowing

Visualize your results using the tidyverse

Visualizing Female Proportion Borrowing

The Borrower Income Ratio

Tidy Big Tables

Limitations of bigmemory

Where should you use bigmemory?

Zacznij rozdział

3

Working with iotools

We'll use the iotools package that can process both numeric and string data, and introduce the concept of chunk-wise processing.

Introduction to chunk-wise processing

Can you split-compute-combine it?

Foldable operations (I)

Foldable operations (II)

A first look at iotools: Importing data

Compare read.delim() and read.delim.raw()

Reading raw data and turning it into a data structure

chunk.apply

Reading chunks in as a matrix

Reading chunks in as a data.frame

Parallelizing calls to chunk.apply

Zacznij rozdział

4

Case Study: A Preliminary Analysis of the Housing Data

In the previous chapters, we've introduced the housing data and shown how to compute with data that is about as big, or bigger than, the amount of RAM on a single machine. In this chapter, we'll go through a preliminary analysis of the data, comparing various trends over time.

Overview of types of analysis for this chapter

Race and Ethnic Representation in the Mortgage Data

Comparing the Borrower Race/Ethnicity and their Proportions

Are the data missing at random?

Looking for Predictable Missingness

A little more about missingness

Analyzing the Housing Data

Borrower Race and Ethnicity by Year (II)

Visualizing the Adjusted Demographic Trends

Relative change in demographic trend

Borrower Lending Trends: City vs. Rural

Borrower Region by Year

Who is securing federally guaranteed loans?

Congratulations!

Zacznij rozdział

Skalowalne przetwarzanie danych w R

Kurs
ukończony

Zdobądź zaświadczenie o ukończeniu

Dodaj to poświadczenie do swojego profilu LinkedIn, CV lub życiorysu
Udostępnij to w mediach społecznościowych i podczas oceny wyników pracyZapisz się teraz

Dołącz do ponad 19 milionów kursantów i zacznij Skalowalne przetwarzanie danych w R już dziś!

Rozwijaj swoje umiejętności w zakresie danych dzięki DataCamp dla urządzeń mobilnych

Rób postępy w podróży dzięki naszym kursom mobilnym i codziennym 5-minutowym wyzwaniom kodowania.