课程

Deep Reinforcement Learning in Python

高级技能水平

更新时间 2024年9月

Learn and use powerful Deep Reinforcement Learning algorithms, including refinement and optimization techniques.

免费开始课程

PyTorchArtificial Intelligence

4小时

15 视频

49 道练习

4,050 XP

5,668

成就证明

深受数千家公司学习者的喜爱

需要团队培训？

企业版试用

课程描述

Discover the cutting-edge techniques that empower machines to learn and interact with their environments. You will dive into the world of Deep Reinforcement Learning (DRL) and gain hands-on experience with the most powerful algorithms driving the field forward. You will use PyTorch and the Gymnasium environment to build your own agents.

Master the Fundamentals of Deep Reinforcement Learning

Our journey begins with the foundations of DRL and their relationship to traditional Reinforcement Learning. From there, we swiftly move on to implementing Deep Q-Networks (DQN) in PyTorch, including advanced refinements such as Double DQN and Prioritized Experience Replay to supercharge your models.Take your skills to the next level as you explore policy-based methods. You will learn and implement essential policy-gradient techniques such as REINFORCE and Actor-Critic methods.

Use Cutting-edge Algorithms

You will encounter powerful DRL algorithms commonly used in the industry today, including Proximal Policy Optimization (PPO). You will gain practical experience with the techniques driving breakthroughs in robotics, game AI, and beyond. Finally, you will learn to optimize your models using Optuna for hyperparameter tuning.By the end of this course, you will have acquired the skills to apply these cutting-edge techniques to real-world problems and harness DRL's full potential!

先决条件

Intermediate Deep Learning with PyTorch Reinforcement Learning with Gymnasium in Python

1

Introduction to Deep Reinforcement Learning

Discover how deep reinforcement learning improves upon traditional Reinforcement Learning while studying and implementing your first Deep Q Learning algorithm.

Introduction to deep reinforcement learning

Environment and neural network setup

DRL training loop

Introduction to deep Q learning

Deep learning and DQN

The Q-Network architecture

Instantiating the Q-Network

The barebone DQN algorithm

Barebone DQN action selection

Barebone DQN loss function

Training the barebone DQN

2

Deep Q-learning

Dive into Deep Q-learning by implementing the original DQN algorithm, featuring Experience Replay, epsilon-greediness and fixed Q-targets. Beyond DQN, you will then explore two fascinating extensions that improve the performance and stability of Deep Q-learning: Double DQN and Prioritized Experience Replay.

DQN with experience replay

The double-ended queue

Experience replay buffer

DQN with experience replay

The complete DQN algorithm

Epsilon-greediness

Fixed Q-targets

Implementing the complete DQN algorithm

Online network and target network in DDQN

Training the double DQN

Prioritized experience replay

Prioritized experience replay buffer

Sampling from the PER buffer

DQN with prioritized experience replay

3

Introduction to Policy Gradient Methods

Learn about the foundational concepts of policy gradient methods found in DRL. You will begin with the policy gradient theorem, which forms the basis for these methods. Then, you will implement the REINFORCE algorithm, a powerful approach to learning policies. The chapter will then guide you through Actor-Critic methods, focusing on the Advantage Actor-Critic (A2C) algorithm, which combines the strengths of both policy gradient and value-based methods to enhance learning efficiency and stability.

Introduction to policy gradient

The policy network architecture

Working with discrete distributions

Policy gradient and REINFORCE

Action selection in REINFORCE

Training the REINFORCE algorithm

Advantage Actor Critic

Critic network

Actor Critic loss calculations

Training the A2C algorithm

4

Proximal Policy Optimization and DRL Tips

Explore Proximal Policy Optimization (PPO) for robust DRL performance. Next, you will examine using an entropy bonus in PPO, which encourages exploration by preventing premature convergence to deterministic policies. You'll also learn about batch updates in policy gradient methods. Finally, you will learn about hyperparameter optimization with Optuna, a powerful tool for optimizing performance in your DRL models.

Proximal policy optimization

The clipped probability ratio

The clipped surrogate objective function

Entropy bonus and PPO

Entropy playground

Training the PPO algorithm

Batch updates in policy gradient

Minibatch and DRL

A2C with batch updates

Hyperparameter optimization with Optuna

Hyperparameter or not?

Hands-on with Optuna

Congratulations!

Deep Reinforcement Learning in Python

课程完成

获得成就证明

将此证书添加到您的 LinkedIn 档案、简历或履历中
在社交媒体和绩效评估中分享立即注册

加入超过19百万学习者，今天就开始Deep Reinforcement Learning in Python！

通过 DataCamp for Mobile 提升您的数据技能

随时随地通过我们的移动课程和每日 5 分钟编程挑战提升技能。