Skip to main content
skill track

Big Data with PySpark

Advance your data skills by mastering Apache Spark. Using the Spark Python API, PySpark, you will leverage parallel computation with large datasets, and get ready for high-performance machine learning. From cleaning data to creating features and implementing machine learning models, you'll execute end-to-end workflows with Spark. The track ends with building a recommendation engine using the popular MovieLens dataset and the Million Songs dataset.

PythonClock24 hoursLearn6 coursesTrophyStatement of Accomplishment

Create Your Free Account



By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Loved by learners at thousands of companies

Introduction to PySpark

Learn to implement distributed data management and machine learning in Spark using the PySpark package.

4 hours

Nick Solomon Headshot

Nick Solomon

Data Scientist


  • Nick Solomon Headshot
    Nick SolomonData ScientistSee Nick Solomon's Portfolio
  • Lore Dirick Headshot
  • Upendra Kumar Devisetty Headshot
  • Mike Metzger Headshot