With an increasing amount of data and more complex algorithms available to scientists and practitioners today, parallel processing is almost always a must, and in fact, is expected in packages implementing time-consuming methods. This course introduces you to concepts and tools available in R for parallel computing and provides solutions to a few important non-trivial issues in parallel processing like reproducibility, generating random numbers and load balancing.
In order to take advantage of parallel environment, the application needs to be split into pieces. In this introductory chapter, you will learn about different ways of partitioning and how it fits different hardware configurations. You will also be introduced to various R packages that support parallel programming.
This chapter will dive deeper into the parallel package. You'll learn about the various backends and their differences and get a deep understanding about the workhorse of the package, namely the clusterApply() function. Strategies for task segmentation including their pitfalls will also be discussed.
In this chapter, you will look at two user-contributed packages, namely foreach and future.apply, which make parallel programming in R even easier. They are built on top of the parallel and future packages. In the last lesson of this chapter, you will learn about the advantages and pitfalls of load balancing and scheduling.
Now you might ask, can I reproduce my results if the application uses random numbers? Can I generate the same results regardless of if the code runs sequentially or in parallel? This chapter will answer these questions. You will learn about a random number generator well suited to a parallel environment and how the various packages make use of it.
PrerequisitesWriting Efficient R Code
Senior Research Scientist, University of Washington
“I've used other sites—Coursera, Udacity, things like that—but DataCamp's been the one that I've stuck with.”
Devon Edwards Joseph
Lloyds Banking Group
“DataCamp is the top resource I recommend for learning data science.”
Harvard Business School
“DataCamp is by far my favorite website to learn from.”
Decision Science Analytics, USAA