Code-along GPT3-5 Fine-tuning

Code-along 2023-12-22 Fine-tuning GPT3.5 with the OpenAI API Questions

Some background context

Fine-tuning models lets you customize them for new tasks. By fine-tuning GPT3.5, you can improve the accuracy of its response, use it to find a particular tone of voice, talk about niche topics, and more. This first of this two-part series, covers how to use the OpenAI API and Python to get started fine-tuning GPT3.5

Data

This case study uses the Yahoo Non-Factoid Question Dataset derived from the Yahoo’s Webscope L6 collection

It has 87,361 questions and their corresponding answers.
Freely available from Hugging Face.

Main tasks

The main tasks include

Loading data from Hugging Face
Preprocess the data for fine-tuning
Fine-tune the GPT3.5 model
Interaction with the fine-tuned model

Target audience

This case study The case study would be of interest to:

AI and Machine Learning Enthusiasts
Data Scientists and Analysts
Academics and Students
Industry Professionals
Software Developers

Key takeaways:

Learn when fine-tuning large language models can be beneficial
How to use the fine-tuning tools in the OpenAI API
Understand the workflow for fine-tuning

Task 0: Installing and Importing Relevant Packages

The main packages that need to be installed are:

datasets: to load datasets from Hugging Face.
openai: to interact with OpenAI models and built-in function.
time: used to track the fine-tuning time.
random: to select random observations from the training data.
json: the format of the training and validation data.

Instructions

Complete the following tasks to successfully complete the packages installation

Use the --upgrade option to install pip command using python
Install the datasets package
Install the version 0.28 of the openai package

%%bash 
python3 -m pip install --upgrade pip
pip -q install -U datasets
pip -q install openai==0.28

Note: Restart the kernel from the top left tap by selecting
- Run > Restart kernel
This ensures that all the changes are successfully performed

Import the following packages
- FineTuningJob and ChatCompletion from openai
- load_dataset function from datasets
- sleep from time
- random
- json

from openai import FineTuningJob, ChatCompletion
from datasets import load_dataset 
from time import sleep
import random 
import json

Task 1: Data Loading

In this section, you will load the yahoo_answers_qa dataset from Hugging Face using the load_dataset function.

Instructions

Acquire the train split of the yahoo_answers_qa data

yahoo_answers_qa = load_dataset("yahoo_answers_qa", split="train")

Check the features/columns and the total number of rows of the data

yahoo_answers_qa

From the above command, you will notice that there are 87362 rows from the dataset, and such a huge amount of data can be long to process, especially during the fine-tuning process.
For simplicity's sake, let's use a subset of 150 rows from the previously loaded dataset.
- Use the .select and the range functions to select a subset of 150 rows

‌
‌
‌

Code-along GPT3-5 Fine-tuning

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Code-along 2023-12-22 Fine-tuning GPT3.5 with the OpenAI API Questions

Some background context

Data

Main tasks

Target audience

Key takeaways:

Task 0: Installing and Importing Relevant Packages

Instructions

Task 1: Data Loading

Instructions

Code-along 2023-12-22 Fine-tuning GPT3.5 with the OpenAI API Questions