본문으로 바로가기

Spark courses

With Spark, data is read into memory, operations are performed, and the results are written back, resulting in faster execution. Learn core principles and common packages on DataCamp.

무료 계정 만들기

Google에서 계속 진행더 많은 옵션 보기

또는


계속 진행하시면 당사의 이용약관개인정보처리방침에 동의하고 및 귀하의 데이터가 미국에 저장되는 것에 동의하게 됩니다.
Group

2명 이상을 교육하시나요?

DataCamp for Business 체험

Recommended for Spark beginners

Build your Spark skills with interactive courses curated by real-world experts

강의

Foundations of PySpark

중급기술 수준
4.7+
리뷰 601개
4시간
Learn to implement distributed data management and machine learning in Spark using the PySpark package.

트랙

PySpark를 활용한 빅데이터

3.6+
리뷰 6개
25시간
Apache Spark의 PySpark API를 사용하여 빅데이터를 처리하고 이를 효율적으로 활용하는 방법을 익히세요.

어디서 시작해야 할지 모르시겠나요?

평가 받기

Spark 강의 및 트랙 둘러보기

강의

PySpark 입문

중급기술 수준
4.7+
리뷰 2,536개
4시간
PySpark를 마스터하여 빅데이터를 손쉽게 처리하세요—대규모 데이터셋을 처리하고 쿼리하며 최적화하여 강력한 분석을 수행하는 방법을 배우세요!

강의

PySpark로 하는 Machine Learning

고급기술 수준
4.8+
리뷰 689개
4시간
Apache Spark로 데이터에서 예측을 수행합니다. 의사결정나무, 로지스틱 회귀, 선형 회귀, 앙상블, 파이프라인을 다룹니다.

강의

Python에서 Spark SQL 입문

고급기술 수준
4.7+
리뷰 142개
4시간
Python에서 SQL을 사용하여 Spark에서 데이터를 조작하고 머신러닝 특징 집합을 생성하는 방법을 배워보세요.

강의

Foundations of PySpark

중급기술 수준
4.7+
리뷰 601개
4시간
Learn to implement distributed data management and machine learning in Spark using the PySpark package.

강의

PySpark로 하는 Feature Engineering

고급기술 수준
4.8+
리뷰 286개
4시간
데이터 과학자가 시간의 70–80%를 쏟는 핵심, 데이터 정제와 피처 엔지니어링의 실무를 깊이 있게 학습하세요.

강의

R에서 sparklyr로 시작하는 Spark

중급기술 수준
4.7+
리뷰 81개
4시간
Learn how to run big data analysis using Spark and the sparklyr package in R, and explore Spark MLIb in just 4 hours.

Spark 관련 리소스

블로그

The Top 20 Spark Interview Questions

Essential Spark interview questions with example answers for job-seekers, data professionals, and hiring managers.
Tim Lu's photo

Tim Lu

블로그

Flink vs. Spark: A Comprehensive Comparison

Comparing Flink vs. Spark, two open-source frameworks at the forefront of batch and stream processing.
Maria Eugenia Inzaugarat's photo

Maria Eugenia Inzaugarat

8분

튜토리얼

PySpark Tutorial: Getting Started with PySpark

A hands-on PySpark tutorial: install PySpark, explore data with DataFrames, and build a K-Means clustering model for customer segmentation.
Natassha Selvaraj's photo

Natassha Selvaraj

15분


Ready to apply your skills?

Projects allow you to apply your knowledge to a wide range of datasets to solve real-world problems in your browser

Frequently asked questions

Which Spark course is the best for absolute beginners?

For new learners, DataCamp has three introductory Spark courses across the most popular programming languages:

Introduction to PySpark 

Introduction to Spark with sparklyr in R 

Introduction to Spark SQL in Python Course

Do I need any prior experience to take a Spark course?

You’ll need to have completed an introduction course to the programming language you’re using Spark on. 

All of which you can find here:

Introduction to Python

Introduction to R

Introduction to SQL

Beyond that, anyone can get started with Spark through simple, interactive exercises on DataCamp.

What is PySpark used for?

If you're already familiar with Python and libraries such as Pandas, then PySpark is a good language to learn to create more scalable analyses and pipelines.

Apache Spark is basically a computational engine that works with huge sets of data by processing them in parallel and batch systems. 

Spark is written in Scala, and PySpark was released to support the collaboration of Spark and Python.

How can Spark help my career?

You’ll gain the ability to analyze data and train machine learning models on large-scale datasets—a valuable skill for becoming a data scientist. 

Having the expertise to work with big data frameworks like Apache Spark will set you apart.

What is Apache Spark?

Apache Spark is an open-source, distributed processing system used for big data workloads. 

It utilizes in-memory caching, and optimized query execution for fast analytic queries against data of any size. 

It provides development APIs in Java, Scala, Python, and R, and supports code reuse across multiple workloads—batch processing, interactive queries, real-time analytics, machine learning, and graph processing.

기타 기술 및 주제

기술

DataCamp for Mobile을 통해 데이터 분석 능력을 향상시키세요.

모바일 강좌와 매일 5분 코딩 챌린지를 통해 이동 중에도 학습 효과를 높이세요.