メインコンテンツへスキップ

ホーム Python

コース

人間のフィードバックによる強化学習（RLHF）

上級スキルレベル

更新日 2024/10

人間の価値観を正確に反映するGenAIの作り方を学び、先進的なLLMで実践スキルを身につけましょう。

コースを無料で開始

PythonArtificial Intelligence4時間13 ビデオ38 演習2,900 XP3,492達成証明書

無料アカウントを作成

または

続行すると、弊社の利用規約、プライバシーポリシーに同意し、データが米国に保存されることに同意したことになります。

数千の企業の学習者に愛されています

2名以上のトレーニングをお考えですか？

DataCamp for Businessを試す

コース説明

このコースでは、生成AIの効率性と人間の専門知識の理解を組み合わせた、人間のフィードバックによる強化学習（RLHF）を学びます。GenAIモデルが人間の価値観や好みを的確に反映する方法を理解し、LLMを用いた実践も行います。報酬モデルの複雑さを把握し、LLMを基盤に現実世界に適応できるAIを構築する手法を身につけます。

前提条件

Deep Reinforcement Learning in Python

1

Foundational Concepts

This chapter introduces the basics of Reinforcement Learning with Human Feedback (RLHF), a technique that uses human input to help AI models learn more effectively. Get started with RLHF by understanding how it differs from traditional reinforcement learning and why human feedback can enhance AI performance in various domains.

Introduction to RLHF

Text generation with RLHF

Classifying generated text for RLHF

RL vs. RLHF

Exploring pre-trained LLMs

Tokenize a text dataset

Fine-tuning for review classification

Preparing data for RLHF

Preparing the preference dataset

Extracting prompts

チャプター開始

2

Gathering Human Feedback

Discover how to set up systems for gathering human feedback in this Chapter. Learn best practices for collecting high-quality data, from pairwise comparisons to uncertainty sampling, and explore strategies for enhancing your data collection.

Methods for high-quality feedback gathering

Understanding comparison and rating in RLHF

Comparing slogans for a gym campaign

Measuring feedback quality and relevance

Low confidence

K-means for feedback clustering

Active learning

Implementing an active learning pipeline

Active learning loop

チャプター開始

3

Tuning Models with Human Feedback

In this Chapter, you'll get into the core of Reinforcement Learning from Human Feedback training. This includes exploring fine-tuning with PPO, techniques to train efficiently, and handling potential divergences from your metrics' objectives.

Reward models explored

Initializing the reward

Setting up the reward trainer

Training with PPO

Initialize the PPO trainer

PPO fine-tuning

Efficient fine-tuning in RLHF

Prepare for 8-bit Training

Train with LoRA

チャプター開始

4

Model Evaluation

Explore key techniques for assessing and improving model performance in this last Chapter of Reinforcement Learning from Human Feedback (RLHF): from fine-tuning metrics to incorporating diverse feedback sources, you'll be provided with a comprehensive toolkit to refine your models effectively.

Model metrics and adjustments

Mitigating negative KL divergence

Checking the reward model

Incorporating diverse feedback sources

Majority voting on multiple data sources

Unreliable data source identification

Evaluating RLHF models

Interpreting curves

Evaluating RLHF with metrics

Wrapping up your RLHF journey

チャプター開始

人間のフィードバックによる強化学習（RLHF）

コース完了

修了証明書を取得

この資格をLinkedInプロフィール、履歴書、CVに追加しましょう
ソーシャルメディアや人事評価で共有しましょう今すぐ登録

19百万人を超える学習者と一緒に人間のフィードバックによる強化学習（RLHF）を今日から始めましょう！

無料アカウントを作成

または

続行すると、弊社の利用規約、プライバシーポリシーに同意し、データが米国に保存されることに同意したことになります。

DataCamp for Mobileでデータスキルを磨きましょう

モバイルコースと毎日の 5 分間のコーディングチャレンジで、外出先でも進歩できます。