Skip to main content
HomeTutorialsArtificial Intelligence (AI)

GPT-4o API Tutorial: Getting Started with OpenAI's API

To connect through the GPT-4o API, obtain your API key from OpenAI, install the OpenAI Python library, and use it to send requests and receive responses from the GPT-4o models.
Updated Sep 29, 2024  · 8 min read

OpenAI's GPT-4o integrates audio, vision, and text capabilities into a single, powerful language model.

This development marks a significant move towards more natural and intuitive human-computer interaction.

In this tutorial, I’ll provide step-by-step guidance on using GPT-4o through the OpenAI API.

Even though OpenAI recently released the O1 model—its most capable reasoning model—GPT-4o and GPT-4o mini remain the best options for applications that require quick responses, image handling, or function calling. If your project demands advanced reasoning, be sure to check out this OpenAI O1 API tutorial.

What Is GPT-4o?

GPT-4o, short for "omni," represents a significant advancement in AI. Unlike GPT-4, which only handles text, GPT-4o is a multimodal model that processes and generates text, audio, and visual data.

GPT-4o comparison with GPT-4 Turbo

By embracing audio and visual data alongside text, GPT-4o breaks free from the constraints of traditional text-only models, creating more natural and intuitive interactions.

GPT-4o has a faster response time, is 50% cheaper than GPT-4 Turbo, and is better at audio and vision understanding than existing models.

If you want to get a more detailed overview of GPT-4o, check out this article on What Is OpenAI’s GPT-4o.

GPT-4o Use Cases

In addition to interacting with GPT-4o through the ChatGPT interface, developers can interact with GPT-4o through the OpenAI API, enabling them to integrate GPT-4o's capabilities into their applications and systems.

The GPT-4o API opens up a vast array of potential use cases by using its multimodal capabilities:

Modality

Use Cases

Description

Text

Text Generation, Text Summarization, Data Analysis & Coding

Content creation, concise summaries, code explanations, and coding assistance.

Audio

Audio Transcription, Real-Time Translation, Audio Generation

Convert audio to text, translate in real-time, create virtual assistants or language learning.

Vision

Image Captioning, Image Analysis & Logic, Accessibility for Visually Impaired

Describe images, analyze visual information, provide accessibility for visually impaired.

Multi

Multimodal Interactions, Roleplay Scenarios

Seamlessly combine modalities, create immersive experiences.

GPT-4o API: How to Connect to OpenAI’s API

Let's now explore how to use GPT-4o through the OpenAI API.

Step 1: Generate an API key

Before using the GPT-4o API, we must sign up for an OpenAI account and obtain an API key. We can create an account on OpenAI API website.

Once we have an account, we can navigate to the API keys page:

OpenAI API

We can now generate an API key. We need to keep it safe, as we won't be able to view it again. But we can always generate a new one if we lose it or need one for a different project.

OpenAI API

Step 2: Import the OpenAI API into Python

To interact with the GPT-4o API programmatically, we'll need to install the OpenAI Python library. We can do this by running the following command:

Once installed, we can import the necessary modules into our Python script:

from openai import OpenAI

Step 3: Make an API call

Before we can make API requests, we'll need to authenticate with our API key:

## Set the API key
client = OpenAI(api_key="your_api_key_here")

Replace "your_api_key_here" with your actual API key.

After completing the client connection, we can start generating text using GPT-4o:

MODEL="gpt-4o"

completion = client.chat.completions.create(
  model=MODEL,
  messages=[
    {"role": "system", "content": "You are a helpful assistant that helps me with my math homework!"},
    {"role": "user", "content": "Hello! Could you solve 20 x 5?"}
  ]
)
print("Assistant: " + completion.choices[0].message.content)

This code snippet uses the chat completions API with the GPT-4o model, which accepts math-related questions as input and generates a response:

GPT-4o code output

GPT-4o API: Audio Use Cases

Audio transcription and summarisation have become essential tools in various applications, from improving accessibility to enhancing productivity. With the GPT-4o API, we can efficiently handle tasks such as transcribing and summarizing audio content.

While GPT-4o has the potential to handle audio directly, the direct audio input feature isn't yet available through the API. For now, we can use a two-step process with the GPT-4o API to transcribe and then summarize audio content.

Step 1: Transcribe audio to text

To transcribe an audio file using GPT-4o, we must provide the audio data to the API. Here's an example:

# Transcribe the audio
audio_path = "path/to/audio.mp3"
transcription = client.audio.transcriptions.create(
    model="whisper-1",
    file=open(audio_path, "rb"),
)

Replace "path/to/audio.mp3" with the actual path to your audio file. This example uses the whisper-1 model for transcription.

Step 2: Summarize audio text

response = client.chat.completions.create(
    model=MODEL,
    messages=[
    {"role": "system", "content":"""You are generating a transcript summary. Create a summary of the provided transcription. Respond in Markdown."""},
    {"role": "user", "content": [
        {"type": "text", "text": f"The audio transcription is: {transcription.text}"}
        ],
    }
    ],
    temperature=0,
)
print(response.choices[0].message.content)

GPT-4o API: Vision Use Cases

Visual data analysis is crucial in various domains, from healthcare to security and beyond. With the GPT-4o API, you can seamlessly analyze images, engage in conversations about visual content, and extract valuable information from images.

Step 1: Add image data to the API

To analyze an image using GPT-4o, we must first provide the image data to the API. We can do this by either encoding a local image as a base64 string or providing a URL to an online image:

import base64

IMAGE_PATH = "image_path"

# Open the image file and encode it as a base64 string
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

base64_image = encode_image(IMAGE_PATH)
 "url": "<https://images.saymedia-content.com/.image/c_limit%2Ccs_srgb%2Cq_auto:eco%2Cw_538/MTczOTQ5NDQyMzQ3NTc0NTc5/compound-shapes-how-to-find-the-area-of-a-l-shape.webp>"

Step 2: Analyze the image data

Once we have processed the image input, we can pass the image data to the API for analysis.

Let’s try to analyze an image to determine the area of a shape. Let’s first use the image below:

Shape for GPT-4o to calculate

We’ll now ask GPT-4o to ask the area of this shape—notice we’re using a base64 image input below:

response = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": "You are a helpful assistant that responds in Markdown. Help me with my math homework!"},
        {"role": "user", "content": [
            {"type": "text", "text": "What's the area of the shape in this image?"},
            {"type": "image_url", "image_url": {
                "url": f"data:image/png;base64,{base64_image}"}
            }
        ]}
    ],
    temperature=0.0,
)
print(response.choices[0].message.content)

Let’s now consider this shape:

Shape for GPT-4o to calculate

We’ll pass the image URL to GPT-4o to find the area of the shape:

response = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": "You are a helpful assistant that responds in Markdown. Help me with my math homework!"},
        {"role": "user", "content": [
            {"type": "text", "text": "What's the area of the shape in the image?"},
            {"type": "image_url", "image_url": {
                "url": "<https://images.saymedia-content.com/.image/c_limit%2Ccs_srgb%2Cq_auto:eco%2Cw_538/MTczOTQ5NDQyMzQ3NTc0NTc5/compound-shapes-how-to-find-the-area-of-a-l-shape.webp>"}
            }
        ]}
    ],
    temperature=0.0,
)
print(response.choices[0].message.content)

Notice that GPT-4o incorrectly measured the width of the vertical rectangle—it should be four centimeters, not two. This discrepancy arises from the misalignment between the measurement labels and the rectangle's actual proportions. If anything, this highlights once again the importance of human supervision and validation.

GPT-4o API Pricing

OpenAI has introduced a competitive pricing structure for the GPT-4o API, making it more accessible and cost-effective than previous models.

Here's a summary of the pricing alongside Antropic’s Claude and Google’s Gemini models (the pricing is in American dollars):

GPT-4o price comparison

As you can see, GPT-4o is priced significantly lower than both GPT-4 Turbo and GPT-4. It’s also competitively priced compared to other state-of-the-art language models like Claude Opus and Gemini 1.5 Pro.

GPT-4o API: Key Considerations

When working with the GPT-4o API, it's important to remember a few key considerations to ensure optimal performance, cost-effectiveness, and alignment with each specific use case. Here are three crucial factors to consider:

Pricing and cost management

The OpenAI API follows a pay-per-use model, where costs are incurred based on the number of tokens processed.

Although GPT-4o is cheaper than GPT-4 Turbo, planning our usage accordingly is crucial to estimating and managing costs.

To minimize costs, you can consider techniques like batching and optimizing prompts to reduce the number of API calls and tokens processed.

Latency and performance

Even though GPT-4o offers impressive performance and low latency, it’s still a large language model, which means that processing requests can be computationally intensive, leading to relatively high latency.

We need to optimize our code and use techniques like caching and asynchronous processing to mitigate latency issues.

Additionally, we can explore using OpenAI's dedicated instances or fine-tuning the model to our specific use case, which can improve performance and reduce latency.

Use case alignment

GPT-4o is a powerful general model with a wide range of capabilities, but we need to ensure that our specific use case aligns with the model's strengths.

Before relying solely on GPT-4o, we must carefully evaluate our use case and consider whether the model's capabilities suit our needs.

If necessary, we could fine-tune smaller models or explore other models that may be better suited for our particular task.

Conclusion

GPT-4o's multimodal capabilities address the limitations of earlier models that struggled to integrate and process different types of data seamlessly.

By leveraging the GPT-4o API, developers can build innovative solutions that seamlessly integrate text, audio, and visual data.

If you want to get more practice with GPT-4o, I recommend this code-along on creating AI assistants with GPT-4o. Similarly, if you want to learn more about working with APIs, I recommend these resources:

FAQs

What is GPT-4o and how does it differ from previous models?

GPT-4o is a multimodal language model developed by OpenAI, capable of processing and generating text, audio, and visual data. Unlike previous models like GPT-4, which only handled text, GPT-4o integrates audio and visual information, enabling more natural interactions and enhanced capabilities across modalities.

How can developers access GPT-4o through the OpenAI API?

Developers can access GPT-4o through the OpenAI API by signing up for an OpenAI account, obtaining an API key, and installing the OpenAI Python library.

What are the costs of using the GPT-4o API, and how does it compare to other models?

The GPT-4o API follows a pay-per-use model, with costs incurred based on the number of tokens processed. Compared to previous models like GPT-4, GPT-4o offers a 50% reduction in costs, making it more affordable. A pricing comparison with other models is provided in the article.

Can GPT-4o be fine-tuned for specific use cases or industries?

Yes, GPT-4o can be fine-tuned for specific use cases or industries through techniques like transfer learning. By fine-tuning on domain-specific data or tasks, developers can enhance the model's performance and tailor it to their unique requirements.

What resources are available for further learning and implementation of the GPT-4o API?

Various resources, including tutorials, courses, and practical examples, are available for further learning and implementing the GPT-4o API. The article recommends exploring DataCamp’s Working with the OpenAI API course, the OpenAI Cookbook, and DataCamp’s cheat sheet for quick reference and practical implementation guidance.

When should I use GPT-4o versus GPT-4o-mini?

GPT-4o is ideal for more complex use cases that require in-depth analysis, language understanding, or longer interactions. On the other hand, GPT-4o-mini is faster and more cost-effective, making it better suited for lightweight tasks or when a quick response is needed. Both models offer multimodal capabilities, but GPT-4o excels when more advanced reasoning and interaction across modalities are essential.

How does the GPT-4o API compare to the o1 API for specific use cases?

While GPT-4o is excellent for tasks that involve multimodal data (text, audio, and images), the o1 API shines in complex reasoning and problem-solving tasks, particularly for science, coding, and mathematics. If you need fast responses with moderate reasoning, GPT-4o is your best bet. However, for tasks requiring in-depth logical analysis and accuracy, like generating intricate code or solving advanced math problems, the o1 API offers stronger capabilities.

Earn a Top AI Certification

Demonstrate you can effectively and responsibly use AI.
Get Certified, Get Hired

Photo of Ryan Ong
Author
Ryan Ong
LinkedIn
Twitter

Ryan is a lead data scientist specialising in building AI applications using LLMs. He is a PhD candidate in Natural Language Processing and Knowledge Graphs at Imperial College London, where he also completed his Master’s degree in Computer Science. Outside of data science, he writes a weekly Substack newsletter, The Limitless Playbook, where he shares one actionable idea from the world's top thinkers and occasionally writes about core AI concepts.

Topics

Learn AI with these courses!

Track

AI Fundamentals

10hrs hr
Discover the fundamentals of AI, dive into models like ChatGPT, and decode generative AI secrets to navigate the dynamic AI landscape.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

tutorial

Using GPT-3.5 and GPT-4 via the OpenAI API in Python

In this tutorial, you'll learn how to work with the OpenAI Python package to programmatically have conversations with ChatGPT.
Richie Cotton's photo

Richie Cotton

14 min

tutorial

OpenAI O1 API Tutorial: How to Connect to OpenAI's API

Learn how to connect to the OpenAI O1 models through the API and manage API costs by understanding reasoning tokens and how to control them.
Alex Olteanu's photo

Alex Olteanu

8 min

tutorial

Fine-Tuning GPT-3 Using the OpenAI API and Python

Unleash the full potential of GPT-3 through fine-tuning. Learn how to use the OpenAI API and Python to improve this advanced neural network model for your specific use case.
Zoumana Keita 's photo

Zoumana Keita

12 min

code-along

Getting Started with the OpenAI API and ChatGPT

Get an introduction to the OpenAI API and the GPT-3 model.
Richie Cotton's photo

Richie Cotton

code-along

Fine-tuning GPT3.5 with the OpenAI API

In this code along, you'll learn how to use the OpenAI API and Python to get started fine-tuning GPT3.5.
Zoumana Keita 's photo

Zoumana Keita

code-along

Creating AI Assistants with GPT-4o

In this code-along, Richie shows you how to use the OpenAI API and GPT-4o to create an AI assistant for data science tasks.
Richie Cotton's photo

Richie Cotton

See MoreSee More