Skip to main content

OpenAI's Preference Fine-Tuning: A Guide With Examples

Learn how to align large language models with user preferences using OpenAI's preference fine-tuning and direct preference optimization.
Jan 7, 2025  · 8 min read

Preference fine-tuning (PFT) is a technique that has been used for a long time to fine-tune LLMs. OpenAI recently introduced it as part of their 12 days of feature rollout.

Along with supervised fine-tuning (SFT) and reinforcement fine-tuning (RFT), PFT is another way to align large language models with user preferences.

In this article, I will quickly explain preference fine-tuning and show you how to use it with minimal effort in OpenAI’s developer dashboard.

What Is OpenAI's Preference Fine-Tuning?

While the goal of the standard supervised fine-tuning is to have the model return certain outputs given certain inputs, the focus in preference fine-tuning (PFT) is to steer the model toward outputting the responses we like and stay away from the responses we don’t.

Direct preference optimization (DPO) is the standard technique in PFT and has gained popularity as a strong alignment technique for its simplicity and effectiveness.

Unlike reinforcement learning from human feedback (RLHF), DPO does not require a complex reward model to align the large language model (LLM) but instead treats the alignment problem as a “direct” optimization of a loss function. This makes DPO easier to implement and more computationally efficient.

The dataset required for DPO, like other alignment methods, includes a pair of “preferred” and “not preferred” responses for a given prompt. In the case of OpenAI’s PFT, similar to other fine-tuning methods, the dataset must be in JSONL format and have the following structure:

{
  "input": {
    "messages": [
      {
        "role": "user",
        "content": "Hello, can you tell me how cold San Francisco is today?"
      }
    ],
    "tools": [],
    "parallel_tool_calls": true
  },
  "preferred_output": [
    {
      "role": "assistant",
      "content": "Today in San Francisco, it is not quite cold as expected. Morning clouds will give away to sunshine, with a high near 68°F (20°C) and a low around 57°F (14°C)."
    }
  ],
  "non_preferred_output": [
    {
      "role": "assistant",
      "content": "It is not particularly cold in San Francisco today."
    }
  ]
}

The official OpenAI documentation for PFT recommends combining SFT and PFT for better alignment. Alignment techniques such as PFT are commonly used after fine-tuning the model on a supervised dataset. Combining SFT with PFT is considered an industry standard and is also advocated by OpenAI.

How to Format the Dataset for Preference Fine-Tuning

Before anything, we need to ensure our dataset is in the required format for OpenAI’s preference fine-tuning.

There are multiple ways to create a preference dataset. One method is to retrieve pairs of LLM outputs given different temperatures, top_k, or system prompts and then use another LLM, preferably a more capable one, to choose one output as “preferred” and the other as “not preferred.”

For the demo in this tutorial, I will download a preference dataset from Hugging Face and re-structure it. I will use argilla/ultrafeedback-binarized-preferences, which is a preference dataset, but I will only get the first 50 rows.

An example of a preference dataset from Hugging Face.

An example of a preference dataset from Hugging Face.

A Python script would suffice to convert the dataset's structure to OpenAI’s fine-tuning requirements, but you can decide to follow another approach based on your application.

import datasets
import copy 
import json

instance_structure = {
  "input": {
    "messages": [
      {
        "role": "user",
        "content": ""
      }
    ],
    "tools": [],
    "parallel_tool_calls": True
  },
  "preferred_output": [
    {
      "role": "assistant",
      "content": ""
    }
  ],
  "non_preferred_output": [
    {
      "role": "assistant",
      "content": ""
    }
  ]
}

ds = datasets.load_dataset("argilla/ultrafeedback-binarized-preferences", trust_remote_code=True, split="train")
ds_sample = ds.select(range(50)) #select the first 50 rows
ds_list = []
for sample in ds_sample:
  instance = copy.deepcopy(instance_structure)
  instance["input"]["messages"][0]["content"] = sample["instruction"]
  instance["preferred_output"][0]["content"] = sample["chosen_response"]
  instance["non_preferred_output"][0]["content"] = sample["rejected_response"]
  ds_list.append(instance)
  
with open("preference_dataset.jsonl", 'w') as out:
  for json_line in ds_list:
      jout = json.dumps(json_line) + '\\n'
      out.write(jout)

Make sure that your dataset's format is jsonl and not json. You will likely have an empty line at the end of your file, so make sure to remove it.

How to Run OpenAI's Preference Fine-Tuning

With your dataset prepared, you can follow these steps to run preference fine-tuning:

  1. Go to OpenAI’s dashboard.
  2. Look for the fine-tuning section and select Create to make a new fine-tuning job.
  3. Choose direct preference optimization as your method for fine-tuning.
  4. Upload your training and validation data (if you have one).

Fine-tuning screenshot from OpenAI dashboard.

OpenAI allows you to specify hyperparameters, but you can also leave them empty, and the system will decide automatically. Depending on your dataset size, the training can take a while.

Conclusion

Direct preference optimization is a useful new feature in OpenAI’s fine-tuning toolbox which allows you to explicitly control the tone and style of your model’s response and make it more aligned with user preference.

As with OpenAI’s distillation, reinforcement fine-tuning, and supervised fine-tuning, your main contribution in preference fine-tuning would be preparing the dataset according to OpenAI’s format and structure, and their tools will take care of the rest:

To learn more, I recommend these resources:


Hesam Sheikh Hassani's photo
Author
Hesam Sheikh Hassani
LinkedIn
Twitter

Master's student of Artificial Intelligence and AI technical writer. I share insights on the latest AI technology, making ML research accessible, and simplifying complex AI topics necessary to keep you at the forefront.

Topics

Learn AI with these courses!

course

Developing AI Systems with the OpenAI API

3 hr
4.6K
Leverage the OpenAI API to get your AI applications ready for production.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

blog

What Is OpenAI's Reinforcement Fine-Tuning?

Learn about OpenAI's reinforcement fine-tuning, a new technique for refining large language models using a reward-driven training loop.
Hesam Sheikh Hassani's photo

Hesam Sheikh Hassani

5 min

blog

What is Transfer Learning in AI? An Introductory Guide with Examples

Learn about transfer learning, fine-tuning, and their foundational roles in machine learning and generative AI.
Javier Canales Luna's photo

Javier Canales Luna

7 min

tutorial

Fine-Tuning LLMs: A Guide With Examples

Learn how fine-tuning large language models (LLMs) improves their performance in tasks like language translation, sentiment analysis, and text generation.
Josep Ferrer's photo

Josep Ferrer

11 min

tutorial

GPT-4o Vision Fine-Tuning: A Guide With Examples

Learn how to fine-tune GPT-4o with images by following this step-by-step tutorial, where you’ll discover how to improve GPT-4o’s ability to recognize Georgian churches.
Dimitri Didmanidze's photo

Dimitri Didmanidze

8 min

tutorial

Fine-Tuning GPT-3 Using the OpenAI API and Python

Unleash the full potential of GPT-3 through fine-tuning. Learn how to use the OpenAI API and Python to improve this advanced neural network model for your specific use case.
Zoumana Keita 's photo

Zoumana Keita

12 min

code-along

Fine-tuning GPT3.5 with the OpenAI API

In this code along, you'll learn how to use the OpenAI API and Python to get started fine-tuning GPT3.5.
Zoumana Keita 's photo

Zoumana Keita

See MoreSee More