Dhavide Aruliah
Dhavide Aruliah

Director of Training at Anaconda

Dhavide Aruliah is Director of Training at Anaconda, the creator and driving force behind the leading Open Data Science platform powered by Python. Dhavide was previously an Associate Professor at the University of Ontario Institute of Technology (UOIT). He served as Program Director for various undergraduate & postgraduate programs at UOIT. His research interests include computational inverse problems, numerical linear algebra, & high-performance computing. The materials for this course were produced by the Anaconda training team.

See More
Matthew Rocklin
Matthew Rocklin

Lead Developer of Dask and Computational Scientist at Anaconda

Matthew Rocklin is an open source software developer at Anaconda focusing on efficient computation and parallel computing, primarily within the Python ecosystem. He has contributed to many of the PyData libraries and is the Lead Developer of the Dask library for parallel computing. Matthew holds a PhD in computer science from the University of Chicago, where he focused on numerical linear algebra, task scheduling, and computer algebra.

See More
Collaborator(s)
  • Hugo Bowne-Anderson

    Hugo Bowne-Anderson

  • Yashas Roy

    Yashas Roy

Course Description

Python is now well established as a major platform for data analysis and data science. For many data scientists, the largest limitation of Python is that all data must fit into the resident memory of the available workstation. Further, traditionally, Python has only been able to utilize one CPU. Data scientists constantly ask, "How can I read and process large amounts of data?" and "How can I make use of more computational processing resources?" This course will introduce you to Dask, a flexible parallel computing library for analytic computing. With Dask, you will be able to take the Python workflows you currently have and easily scale them up to large datasets on your workstation without the need to migrate to a distributed computing environment.

  1. Working with Big Data

  2. Working with Dask Arrays

  3. Working with Dask DataFrames

  4. Working with Dask Bags for Unstructured Data

  5. Case Study: Analyzing Flight Delays