Skill Track

Big Data with PySpark

Advance your data skills by mastering Apache Spark. Using the Spark Python API, PySpark, you will leverage parallel computation with large datasets, and get ready for high-performance machine learning. From cleaning data to creating features and implementing machine learning models, you'll execute end-to-end workflows with Spark. The track ends with building a recommendation engine using the popular MovieLens dataset and the Million Songs dataset.

  • Python
  • 24 hours
  • 6 courses
1
Python Icon

Introduction to PySpark

Learn to implement distributed data management and machine learning in Spark using the PySpark package.

4 hours
Photo of Nick Solomon
Nick Solomon

Data Scientist

3
Python Icon

Cleaning Data with PySpark

Learn how to clean data with Apache Spark in Python.

4 hours
Photo of Mike Metzger
Mike Metzger

Data Engineer Consultant @ Flexible Creations

Instructors

Instructor Avatar
Nick Solomon

Data Scientist

Instructor Avatar
Upendra Kumar Devisetty

Science Analyst at CyVerse

Instructor Avatar
Mike Metzger

Data Engineer Consultant @ Flexible Creations

See all instructors

Ready To Learn?

Join 5,260,000 data science learners today!

Start Learning for Free

Maximize Your Team’s Talent

Join 100s of businesses and create a culture of data at your company!