본문으로 바로가기

강의

Python으로 배우는 Deep Reinforcement Learning

고급기술 수준

업데이트됨 2024. 9.

정교화와 최적화 기법을 포함한 강력한 Deep Reinforcement Learning 알고리즘을 학습하고 활용합니다.

무료로 강의 시작

PyTorchArtificial Intelligence

4시간

15 동영상

49 연습 문제

4,050 XP

5,668

성취 증명서

수천 개 기업의 학습자들이 사랑하는

팀을 교육하시나요?

비즈니스용으로 체험해 보세요

강의 설명

기계가 학습하고 주변 환경과 상호작용할 수 있게 하는 최첨단 기법을 알아보세요. 딥 강화 학습(DRL)의 세계에 뛰어들어 이 분야를 이끄는 가장 강력한 알고리즘을 직접 실습하며 경험하게 됩니다. PyTorch와 Gymnasium 환경을 사용하여 자신만의 에이전트를 구축하게 됩니다.

딥 강화 학습의 기초를 마스터하세요

우리의 여정은 DRL의 기초와 전통적인 강화 학습과의 관계에서 시작됩니다. 그다음에는 PyTorch에서 Deep Q-Networks(DQN)를 빠르게 구현하는 단계로 넘어가며, Double DQN과 Prioritized Experience Replay 같은 고급 개선 기법까지 포함해 모델의 성능을 한층 끌어올립니다.정책 기반 방법을 탐구하며 실력을 한 단계 끌어올리세요. REINFORCE와 Actor-Critic 기법과 같은 필수 정책 경사 기법을 배우고 구현하게 됩니다.

최첨단 알고리즘 사용하기

오늘날 업계에서 널리 사용되는 강력한 DRL 알고리즘들을 접하게 되며, 여기에는 Proximal Policy Optimization(PPO)도 포함됩니다. 로봇공학, 게임 AI 등에서 혁신을 이끄는 기법을 실무적으로 익히게 됩니다. 마지막으로, 하이퍼파라미터 튜닝을 위해 Optuna를 사용해 모델을 최적화하는 방법을 배우게 됩니다.이 강의를 마치면, 이러한 최첨단 기법을 실제 문제에 적용하고 DRL의 잠재력을 최대한 활용할 수 있는 역량을 갖추게 됩니다!

선수 조건

Intermediate Deep Learning with PyTorch Reinforcement Learning with Gymnasium in Python

1

Introduction to Deep Reinforcement Learning

Discover how deep reinforcement learning improves upon traditional Reinforcement Learning while studying and implementing your first Deep Q Learning algorithm.

Introduction to deep reinforcement learning

Environment and neural network setup

DRL training loop

Introduction to deep Q learning

Deep learning and DQN

The Q-Network architecture

Instantiating the Q-Network

The barebone DQN algorithm

Barebone DQN action selection

Barebone DQN loss function

Training the barebone DQN

2

Deep Q-learning

Dive into Deep Q-learning by implementing the original DQN algorithm, featuring Experience Replay, epsilon-greediness and fixed Q-targets. Beyond DQN, you will then explore two fascinating extensions that improve the performance and stability of Deep Q-learning: Double DQN and Prioritized Experience Replay.

DQN with experience replay

The double-ended queue

Experience replay buffer

DQN with experience replay

The complete DQN algorithm

Epsilon-greediness

Fixed Q-targets

Implementing the complete DQN algorithm

Online network and target network in DDQN

Training the double DQN

Prioritized experience replay

Prioritized experience replay buffer

Sampling from the PER buffer

DQN with prioritized experience replay

3

Introduction to Policy Gradient Methods

Learn about the foundational concepts of policy gradient methods found in DRL. You will begin with the policy gradient theorem, which forms the basis for these methods. Then, you will implement the REINFORCE algorithm, a powerful approach to learning policies. The chapter will then guide you through Actor-Critic methods, focusing on the Advantage Actor-Critic (A2C) algorithm, which combines the strengths of both policy gradient and value-based methods to enhance learning efficiency and stability.

Introduction to policy gradient

The policy network architecture

Working with discrete distributions

Policy gradient and REINFORCE

Action selection in REINFORCE

Training the REINFORCE algorithm

Advantage Actor Critic

Critic network

Actor Critic loss calculations

Training the A2C algorithm

4

Proximal Policy Optimization and DRL Tips

Explore Proximal Policy Optimization (PPO) for robust DRL performance. Next, you will examine using an entropy bonus in PPO, which encourages exploration by preventing premature convergence to deterministic policies. You'll also learn about batch updates in policy gradient methods. Finally, you will learn about hyperparameter optimization with Optuna, a powerful tool for optimizing performance in your DRL models.

Proximal policy optimization

The clipped probability ratio

The clipped surrogate objective function

Entropy bonus and PPO

Entropy playground

Training the PPO algorithm

Batch updates in policy gradient

Minibatch and DRL

A2C with batch updates

Hyperparameter optimization with Optuna

Hyperparameter or not?

Hands-on with Optuna

Congratulations!

Python으로 배우는 Deep Reinforcement Learning

강의
완료

수료증 획득

LinkedIn 프로필, 이력서 또는 CV에 이 인증서를 추가하세요
소셜 미디어와 성과 평가에서 공유하세요지금 등록

19백만 명 이상의 학습자와 함께 Python으로 배우는 Deep Reinforcement Learning을(를) 시작하세요!

DataCamp for Mobile을 통해 데이터 분석 능력을 향상시키세요.

모바일 강좌와 매일 5분 코딩 챌린지를 통해 이동 중에도 학습 효과를 높이세요.