Blog

GPT-4o API Tutorial: Getting Started with OpenAI's API

To connect through the GPT-4o API, obtain your API key from OpenAI, install the OpenAI Python library, and use it to send requests and receive responses from the GPT-4o models.

May 2024 · 8 min read

OpenAI's GPT-4o represents a major advancement in AI, integrating audio, vision, and text capabilities into a single, powerful language model.

This development marks a significant move towards more natural and intuitive human-computer interaction.

In this tutorial, we'll dive into the details of GPT-4o, explore its potential use cases, and provide a step-by-step guide to using GPT-4o through the OpenAI API.

If you want to get an overview of GPT-4o, check out this article on What is OpenAI’s GPT-4o.

What is GPT-4o?

GPT-4o, short for "omni," represents a significant advancement in AI. Unlike GPT-4, which only handles text, GPT-4o is a multimodal model that processes and generates text, audio, and visual data.

By embracing audio and visual data alongside text, GPT-4o breaks free from the constraints of traditional text-only models, creating more natural and intuitive interactions.

GPT-4o has a faster response time, is 50% cheaper than GPT-4 Turbo, and is better at audio and vision understanding than existing models.

GPT-4o Use Cases

In addition to interacting with GPT-4o through the ChatGPT interface, developers can interact with GPT-4o through the OpenAI API, enabling them to integrate GPT-4o's capabilities into their applications and systems.

The GPT-4o API opens up a vast array of potential use cases by leveraging its multimodal capabilities:

Modality	Use Cases	Description
Text	Text Generation, Text Summarization, Data Analysis & Coding	Content creation, concise summaries, code explanations, and coding assistance.
Audio	Audio Transcription, Real-Time Translation, Audio Generation	Convert audio to text, translate in real-time, create virtual assistants or language learning.
Vision	Image Captioning, Image Analysis & Logic, Accessibility for Visually Impaired	Describe images, analyze visual information, provide accessibility for visually impaired.
Multi	Multimodal Interactions, Roleplay Scenarios	Seamlessly combine modalities, create immersive experiences.

GPT-4o API: How to Connect to OpenAI’s API

Let's now explore how to use GPT-4o through the OpenAI API.

Step 1: Generate an API Key

Before using the GPT-4o API, we must sign up for an OpenAI account and obtain an API key. We can create an account on OpenAI API website.

Once we have an account, we can navigate to the API keys page:

We can now generate an API key. We need to keep it safe, as we won't be able to view it again. But we can always generate a new one if we lose it or need one for a different project.

Step 2: Import the OpenAI API into Python

To interact with the GPT-4o API programmatically, we'll need to install the OpenAI Python library. We can do this by running the following command:

Once installed, we can import the necessary modules into our Python script:

from openai import OpenAI

Step 3: Make an API call

Before we can make API requests, we'll need to authenticate with our API key:

## Set the API key
client = OpenAI(api_key="your_api_key_here")

Replace "your_api_key_here" with your actual API key.

After completing the client connection, we can start generating text using GPT-4o:

MODEL="gpt-4o"

completion = client.chat.completions.create(
  model=MODEL,
  messages=[
    {"role": "system", "content": "You are a helpful assistant that helps me with my math homework!"},
    {"role": "user", "content": "Hello! Could you solve 20 x 5?"}
  ]
)
print("Assistant: " + completion.choices[0].message.content)

This code snippet uses the chat completions API with the GPT-4o model, which accepts math-related questions as input and generates a response:

GPT-4o API: Audio Use Cases

Audio transcription and summarisation have become essential tools in various applications, from improving accessibility to enhancing productivity. With the GPT-4o API, we can efficiently handle tasks such as transcribing and summarizing audio content.

While GPT-4o has the potential to handle audio directly, the direct audio input feature isn't yet available through the API. For now, we can use a two-step process with the GPT-4o API to transcribe and then summarize audio content.

Step 1: Transcribe audio to text

To transcribe an audio file using GPT-4o, we must provide the audio data to the API. Here's an example:

# Transcribe the audio
audio_path = "path/to/audio.mp3"
transcription = client.audio.transcriptions.create(
    model="whisper-1",
    file=open(audio_path, "rb"),
)

Replace "path/to/audio.mp3" with the actual path to your audio file. This example uses the whisper-1 model for transcription.

Step 2: Summarize audio text

response = client.chat.completions.create(
    model=MODEL,
    messages=[
    {"role": "system", "content":"""You are generating a transcript summary. Create a summary of the provided transcription. Respond in Markdown."""},
    {"role": "user", "content": [
        {"type": "text", "text": f"The audio transcription is: {transcription.text}"}
        ],
    }
    ],
    temperature=0,
)
print(response.choices[0].message.content)

GPT-4o API: Vision Use Cases

Visual data analysis is crucial in various domains, from healthcare to security and beyond. With the GPT-4o API, you can seamlessly analyze images, engage in conversations about visual content, and extract valuable information from images.

Step 1: Add image data to the API

To analyze an image using GPT-4o, we must first provide the image data to the API. We can do this by either encoding a local image as a base64 string or providing a URL to an online image:

import base64

IMAGE_PATH = "image_path"

# Open the image file and encode it as a base64 string
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

base64_image = encode_image(IMAGE_PATH)

 "url": "<https://images.saymedia-content.com/.image/c_limit%2Ccs_srgb%2Cq_auto:eco%2Cw_538/MTczOTQ5NDQyMzQ3NTc0NTc5/compound-shapes-how-to-find-the-area-of-a-l-shape.webp>"

Step 2: Analyze the image data

Once we have processed the image input, we can pass the image data to the API for analysis.

Let’s try to analyze an image to determine the area of a shape. Let’s first use the image below:

We’ll now ask GPT-4o to ask the area of this shape—notice we’re using a base64 image input below:

response = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": "You are a helpful assistant that responds in Markdown. Help me with my math homework!"},
        {"role": "user", "content": [
            {"type": "text", "text": "What's the area of the shape in this image?"},
            {"type": "image_url", "image_url": {
                "url": f"data:image/png;base64,{base64_image}"}
            }
        ]}
    ],
    temperature=0.0,
)
print(response.choices[0].message.content)

Let’s now consider this shape:

We’ll pass the image URL to GPT-4o to find the area of the shape:

response = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": "You are a helpful assistant that responds in Markdown. Help me with my math homework!"},
        {"role": "user", "content": [
            {"type": "text", "text": "What's the area of the shape in the image?"},
            {"type": "image_url", "image_url": {
                "url": "<https://images.saymedia-content.com/.image/c_limit%2Ccs_srgb%2Cq_auto:eco%2Cw_538/MTczOTQ5NDQyMzQ3NTc0NTc5/compound-shapes-how-to-find-the-area-of-a-l-shape.webp>"}
            }
        ]}
    ],
    temperature=0.0,
)
print(response.choices[0].message.content)

Notice that GPT-4o incorrectly measured the width of the vertical rectangle—it should be four centimeters, not two. This discrepancy arises from the misalignment between the measurement labels and the rectangle's actual proportions. If anything, this highlights once again the importance of human supervision and validation.

GPT-4o API Pricing

OpenAI has introduced a competitive pricing structure for the GPT-4o API, making it more accessible and cost-effective than previous models.

Here's a summary of the pricing alongside Antropic’s Claude and Google’s Gemini models (the pricing is in American dollars):

As you can see, GPT-4o is priced significantly lower than both GPT-4 Turbo and GPT-4. It’s also competitively priced compared to other state-of-the-art language models like Claude Opus and Gemini 1.5 Pro.

GPT-4o API: Key Considerations

When working with the GPT-4o API, it's important to remember a few key considerations to ensure optimal performance, cost-effectiveness, and alignment with each specific use case. Here are three crucial factors to consider:

Pricing and cost management

The OpenAI API follows a pay-per-use model, where costs are incurred based on the number of tokens processed.

Although GPT-4o is cheaper than GPT-4 Turbo, planning our usage accordingly is crucial to estimating and managing costs.

To minimize costs, you can consider techniques like batching and optimizing prompts to reduce the number of API calls and tokens processed.

Latency and performance

Even though GPT-4o offers impressive performance and low latency, it’s still a large language model, which means that processing requests can be computationally intensive, leading to relatively high latency.

We need to optimize our code and use techniques like caching and asynchronous processing to mitigate latency issues.

Additionally, we can explore using OpenAI's dedicated instances or fine-tuning the model to our specific use case, which can improve performance and reduce latency.

Use case alignment

GPT-4o is a powerful general model with a wide range of capabilities, but we need to ensure that our specific use case aligns with the model's strengths.

Before relying solely on GPT-4o, we must carefully evaluate our use case and consider whether the model's capabilities suit our needs.

If necessary, we could fine-tune smaller models or explore other models that may be better suited for our particular task.

Conclusion

GPT-4o's multimodal capabilities address the limitations of earlier models that struggled to integrate and process different types of data seamlessly.

By leveraging the GPT-4o API, developers can build innovative solutions that seamlessly integrate text, audio, and visual data.

If you want to get more practice with GPT-4o, I recommend this code-along on creating AI assistants with GPT-4o. Similarly, if you want to learn more about working with APIs, I recommend these resources:

Author

Ryan Ong

What is GPT-4o and how does it differ from previous models?

How can developers access GPT-4o through the OpenAI API?

What are the costs of using the GPT-4o API, and how does it compare to other models?

Can GPT-4o be fine-tuned for specific use cases or industries?

What resources are available for further learning and implementation of the GPT-4o API?

Topics

Artificial Intelligence (AI)

Python

Learn AI with these courses!

Track

AI Fundamentals

10hrs hr

Discover the fundamentals of AI, dive into models like ChatGPT, and decode generative AI secrets to navigate the dynamic AI landscape.

See Details

Start Course

Track

AI Business Fundamentals

11hrs hr

Accelerate your AI journey, conquer ChatGPT, and develop a comprehensive Artificial Intelligence strategy.

See Details

Start Course

Course

Developing AI Systems with the OpenAI API

3 hr

922

Leverage the OpenAI API to get your AI applications ready for production.

See Details

Start Course

cheat sheet

The OpenAI API in Python

ChatGPT and large language models have taken the world by storm. In this cheat sheet, learn the basics on how to leverage one of the most powerful AI APIs out there, then OpenAI API.

Richie Cotton

3 min

tutorial

Using GPT-3.5 and GPT-4 via the OpenAI API in Python

In this tutorial, you'll learn how to work with the OpenAI Python package to programmatically have conversations with ChatGPT.

Richie Cotton

14 min

tutorial

Fine-Tuning GPT-3 Using the OpenAI API and Python

Unleash the full potential of GPT-3 through fine-tuning. Learn how to use the OpenAI API and Python to improve this advanced neural network model for your specific use case.

Zoumana Keita

12 min

code-along

Getting Started with the OpenAI API and ChatGPT

Get an introduction to the OpenAI API and the GPT-3 model.

Richie Cotton

code-along

Fine-tuning GPT3.5 with the OpenAI API

In this code along, you'll learn how to use the OpenAI API and Python to get started fine-tuning GPT3.5.

Zoumana Keita

code-along

Creating AI Assistants with GPT-4o

In this code-along, Richie shows you how to use the OpenAI API and GPT-4o to create an AI assistant for data science tasks.

Richie Cotton

See More See More

What is GPT-4o?

GPT-4o Use Cases

GPT-4o API: How to Connect to OpenAI’s API

Step 1: Generate an API Key

Step 2: Import the OpenAI API into Python

Step 3: Make an API call

GPT-4o API: Audio Use Cases

Step 1: Transcribe audio to text

Step 2: Summarize audio text

GPT-4o API: Vision Use Cases

Step 1: Add image data to the API

Step 2: Analyze the image data

GPT-4o API Pricing

GPT-4o API: Key Considerations

Pricing and cost management

Latency and performance

Use case alignment

Conclusion

GPT-4o FAQs

What are the costs of using the GPT-4o API, and how does it compare to other models?

Can GPT-4o be fine-tuned for specific use cases or industries?

What resources are available for further learning and implementation of the GPT-4o API?

The OpenAI API in Python

Using GPT-3.5 and GPT-4 via the OpenAI API in Python

Fine-Tuning GPT-3 Using the OpenAI API and Python

Getting Started with the OpenAI API and ChatGPT

Fine-tuning GPT3.5 with the OpenAI API

Creating AI Assistants with GPT-4o

.css-1531qan{-webkit-text-decoration:none;text-decoration:none;color:inherit;}AI Fundamentals

AI Business Fundamentals

Developing AI Systems with the OpenAI API

The OpenAI API in Python

Using GPT-3.5 and GPT-4 via the OpenAI API in Python

Fine-Tuning GPT-3 Using the OpenAI API and Python

Getting Started with the OpenAI API and ChatGPT

Fine-tuning GPT3.5 with the OpenAI API

Creating AI Assistants with GPT-4o

AI Fundamentals