Skill Track

Big Data with PySpark

Advance your data skills by mastering Apache Spark. Using the Spark Python API, PySpark, you will leverage parallel computation with large datasets, and get ready for high-performance machine learning. From cleaning data to creating features and implementing machine learning models, you'll execute end-to-end workflows with Spark. The track ends with building a recommendation engine using the popular MovieLens dataset and the Million Songs dataset.

  • Python
  • 24 hours
  • 6 courses
Python Icon

Introduction to PySpark

Learn to implement distributed data management and machine learning in Spark using the PySpark package.

4 hours
Nick Solomon

Data Scientist


Nick Solomon

Data Scientist

Upendra Kumar Devisetty

Science Analyst at CyVerse

Mike Metzger

Data Engineer Consultant @ Flexible Creations

See all instructors

Ready To Learn?

Join 6,810,000 data science learners today!

Start Learning for Free