Skill Track

Big Data with PySpark

Advance your data skills by mastering Apache Spark. Using the Spark Python API, PySpark, you will leverage parallel computation with large datasets, and get ready for high-performance machine learning. From cleaning data to creating features and implementing machine learning models, you'll execute end-to-end workflows with Spark. The track ends with building a recommendation engine using the popular MovieLens dataset and the Million Songs dataset.

  • Python
  • 24 hours
  • 6 courses
1
Python Icon

Introduction to PySpark

Learn to implement distributed data management and machine learning in Spark using the PySpark package.

4 hours
Nick Solomon

Data Scientist

3
Python Icon

Cleaning Data with PySpark

Learn how to clean data with Apache Spark in Python.

4 hours
Mike Metzger

Data Engineer Consultant @ Flexible Creations

Instructors

Nick Solomon

Data Scientist

Upendra Kumar Devisetty

Science Analyst at CyVerse

Mike Metzger

Data Engineer Consultant @ Flexible Creations

See all instructors

Ready To Learn?

Join 5,690,000 data science learners today!

Start Learning for Free

Maximize Your Team’s Talent

Join 100s of businesses and create a culture of data at your company!