Skip to content
Code-along GPT3-5 Fine-tuning
  • AI Chat
  • Code
  • Report
  • Code-along 2023-12-22 Fine-tuning GPT3.5 with the OpenAI API Questions

    Some background context

    Fine-tuning models lets you customize them for new tasks. By fine-tuning GPT3.5, you can improve the accuracy of its response, use it to find a particular tone of voice, talk about niche topics, and more. This first of this two-part series, covers how to use the OpenAI API and Python to get started fine-tuning GPT3.5

    Data

    This case study uses the Yahoo Non-Factoid Question Dataset derived from the Yahoo’s Webscope L6 collection

    • It has 87,361 questions and their corresponding answers.
    • Freely available from Hugging Face.

    Main tasks

    The main tasks include

    • Loading data from Hugging Face
    • Preprocess the data for fine-tuning
    • Fine-tune the GPT3.5 model
    • Interaction with the fine-tuned model

    Target audience

    This case study The case study would be of interest to:

    • AI and Machine Learning Enthusiasts
    • Data Scientists and Analysts
    • Academics and Students
    • Industry Professionals
    • Software Developers

    Key takeaways:

    • Learn when fine-tuning large language models can be beneficial
    • How to use the fine-tuning tools in the OpenAI API
    • Understand the workflow for fine-tuning

    Task 0: Installing and Importing Relevant Packages

    The main packages that need to be installed are:

    • datasets: to load datasets from Hugging Face.
    • openai: to interact with OpenAI models and built-in function.
    • time: used to track the fine-tuning time.
    • random: to select random observations from the training data.
    • json: the format of the training and validation data.

    Instructions

    Complete the following tasks to successfully complete the packages installation

    • Use the --upgrade option to install pip command using python
    • Install the datasets package
    • Install the version 0.28 of the openai package
    %%bash 
    python3 -m pip install --upgrade pip
    pip -q install -U datasets
    pip -q install openai==0.28
    • Note: Restart the kernel from the top left tap by selecting
      • Run > Restart kernel
    • This ensures that all the changes are successfully performed
    • Import the following packages
      • FineTuningJob and ChatCompletion from openai
      • load_dataset function from datasets
      • sleep from time
      • random
      • json
    from openai import FineTuningJob, ChatCompletion
    from datasets import load_dataset 
    from time import sleep
    import random 
    import json

    Task 1: Data Loading

    In this section, you will load the yahoo_answers_qa dataset from Hugging Face using the load_dataset function.

    Instructions

    • Acquire the train split of the yahoo_answers_qa data
    yahoo_answers_qa = load_dataset("yahoo_answers_qa", split="train")
    • Check the features/columns and the total number of rows of the data
    yahoo_answers_qa
    • From the above command, you will notice that there are 87362 rows from the dataset, and such a huge amount of data can be long to process, especially during the fine-tuning process.
    • For simplicity's sake, let's use a subset of 150 rows from the previously loaded dataset.
      • Use the .select and the range functions to select a subset of 150 rows