Matt Dowle is the main author of the data.table package. Matt has worked for some of the world’s largest financial organizations and has been programming in R for over a decade.
Arun is one of the main contributors to the data.table package. He started using R in late 2011 and works as a data analyst at Open Analytics. He has a passion for developing tools and applying algorithms facilitating big-data analyses, and routinely works with data sizes in the order of several GBs.
The R data.table package is rapidly making its name as the number one choice for handling large datasets in R. This online data.table tutorial will bring you from data.table novice to expert in no time. Once you are introduced to the general form of a data.table query, you will learn the techniques to subset your data.table, how to update by reference and how you can use data.table’s set()-family in your workflow. The course finishes with more complex concepts such as indexing, keys and fast ordered joins. Upon completion of the course, you will be able to use data.table in R for a more efficient manipulation and analysis process. Enjoy!
Introduction on what exactly a data.table is, how it differs from the traditional data.frame in R, and understanding the general form of a data.table query.
Learn how to do multiple operations on the same data.table in one single statement, how to easily take a subset of your data, update by reference, and work with the data.table set()-family.
Discover the potential behind indexing, followed by generating and using keys. The final part focuses on fast ordered joins.