Skip to main content

GPT-4o Image Generation: A Guide With 8 Practical Examples

Learn what GPT-4o image generation is, how to use it, and discover 8 practical examples to understand its capabilities.
Mar 27, 2025  · 8 min read

As a photographer and someone interested in art in general, I'm always intrigued when a new image-generation model comes out. OpenAI’s GPT-4o image generation truly blew me away.

I have ideas in my mind that I’d like to express visually, but sometimes I find it hard to bring them to life. I keep hoping a model will come along that can bridge the gap between reality and my vision. The new model might just be that bridge.

In this article, I’ll showcase the capabilities of OpenAI's new image generation model through 8 practical examples. Additionally, we've also created a video with three extra real-world use cases:

What Is GPT-4o Image Generation?

GPT-4o image generation is a new feature in the GPT-4o model that allows users to create images directly within ChatGPT. This feature brings native image generation to the platform, making it accessible for various purposes like creativity, education, and more.

The launch represents a big leap forward from prior image generation technologies, as it aims to make the creation of images more accurate, user-friendly, and useful across many situations. For instance, users can now generate images by providing specific prompts, blending images with text, or even editing images through simple instructions.

Overall, GPT-4o image generation can be used for various creative tasks, such as making comics, designing trading cards, crafting memes, or even creating educational materials that explain complex topics. For instance, I prompted ChatGPT to summarize the content of this section through an infographic:

gpt-4o image generation

Example infographic using GPT-4o image generation

How to Access GPT-4o Image Generation?

The GPT-4o image generation feature is available as the default image generator in ChatGPT. According to OpenAI, it is available for Plus, Pro, Team, and Free users. However, in my experience, I couldn't get it to work on my Free plan, and later OpenAI confirmed that access is not yet available on the Free plan because of the high demand.

Developers will have the opportunity to generate images with GPT-4o through the API in the coming weeks.

You can create images with GPT-4o by selecting the GPT-4o model and providing a text prompt describing what you want it to generate.

Generating an image with GPT-4o image generation

We can also keep chatting to request changes:

Editing an image with GPT-4o

GPT-4o Image Generation Examples

Now that we've covered how to use the model, let's demonstrate what it can do through eight practical examples.

OpenAI claims that this new model doesn't just generate pretty images. It is able to generate images that are actually useful in the real world. In my opinion, for an image generation model to be truly useful, it must be able to modify existing images or apply existing styles consistently.

In real-life situations, we usually don't want an image from scratch. Rather, we have a style and want to generate an image in that style, or we have a photo and need to modify it in some way. Here are a few examples:

  • A coffee shop owner wanting to post a marketing photo doesn't want an image of a random coffee shop—they want a photo of their coffee shop.
  • If I am using AI to create a visual story, I need to be able to keep a consistent character throughout the story. It's of no use if the images aren't consistent.
  • As a photographer, I have no interest in generating an image from scratch that doesn't exist in real life. Rather, I want to be able to edit an existing photograph.

1. Text

We already saw in the logo example that GPT-4o can generate text in images. Generating stand-alone text is probably the easiest example.

To test this further, I tried generating text on an object:

Example on how GPT-4o handles text on an object

This example showcases two important features:

  1. The model is able to generate text about an object in a way that is consistent with the shape of the object.
  2. The model can understand colors and follow a color scheme.

To push the model further, I asked it to generate longer text and display it in the image in a readable way. Here is the result:

More complex text example

I was impressed by this. Other models I've tried in the past have not performed this task so well.

2. Transparency

GPT-4o is able to generate images with transparent areas. This is especially useful for images that are meant to be overlaid on top of other content, like stickers of characters from a game.

I took a photo of myself and asked GPT-4o to create a pixel art character based on it. Here's the result:

Generating characters and handling transparency

Note that it didn't generate a transparent background by default, but asking for it worked well and didn't alter the original result.

3. Character consistency

Based on the previous conversation, I tried to generate a scene using the pixel art character I had generated. This was the result:

Character consistency with GPT-4o image generation

The character in this image has a different resolution than the original one. It has more details, so it seems that GPT-4o generates a new one based on the photo rather than using the character it created before.

It's still a nice result, but it's not usable as is in a game because we need the two characters to be more consistent. At this stage, it's better as inspiration for a pixel artist rather than an end result in itself.

4. Creating a detailed story

Next, I wanted to create a comic book strip to tell the story of how I took a cityscape photo of Taipei a few months back. I used this to test how GPT-4o handles generating an image from detailed instructions.

I started by asking the model to generate a comic book character based on myself. Then, I provided the details of each frame in the comic book strip. 

Generating images with complex instructions

The first result was close to what I wanted, but not fully accurate. Also, I felt again that the model generated a new character rather than using the first one it generated.

However, I was very pleased with the result after my changes were requested. It was an interesting feeling to see that night come to life as a comic book strip.

Adjusting parts of an image

I particularly loved that it was able to mimic the photo in the last frame. I think it elevated the result.

5. Photo editing

Next, I tried photo editing. A few months ago, I was traveling back to Europe, and I took a photo before boarding the plane. Unfortunately, there was an annoying reflection on the window because I took the photo from the inside. I tried using Photoshop to remove it but didn't succeed.

I tried again using GPT-4o, and it worked really well.

photo editing with GPT-4o image generation

Here are a few other examples of editing a photo using GPT-4o:

More example of photo editing with GPT-4o

Again, it's not perfect but still pretty good. In the first example, the people were removed but the building in the back got modified. The night photos are nice but a little bit too dark.

Another interesting detail is that due to the conversational aspect of GPT-4o, it tends to apply the new changes to the latest image. In this case, when I requested the rain, I was expecting it to modify the original image, not the night image. 

We can get around this by specifying the image in the prompt or starting a new conversation.

6. Color grading

Most of my photo editing consists of adjusting the colors, not modifying the content of the photo.

I was curious to see how good GPT-4o was at color grading, so I experimented with color grading in one of my photos. One of my favorite movies is Blade Runner 2049, and I like the overall aesthetic of the movie, so I wanted to see if GPT-4o could color-grade one of my urban photos in that style. Here's the result:

Color grading with gpt-4o image generation

I loved the result. It saved me so much time compared to editing it myself. I also really enjoy the fact that it (mostly) preserved the integrity of the image.

In this example, we describe the desired result textually. I also tried to give it a sample image with a color palette to see if it could color-grade my photo in that style. In my opinion, it did a very good job at it.

Color grading with image style

7. Infographics and diagrams

An infographic is a visual representation of information or data designed to make complex ideas easier to understand quickly. So far, I haven't seen a model that can produce useful infographics.

Let's put GPT-4o to the test by asking it to generate an infographic explaining why there are so many earthquakes in Taiwan.

Infographics on GPT-4o image generation

The first result was quite inaccurate, as both the location and spelling of Taiwan are incorrect. I asked it to fix it and got a better result. However, the new result is still not perfect because the end of the explanation is cut off.

This shows the model isn't perfect yet. However, I've seen a lot of examples online where it did pretty well at this task.

As an online educator, I often need to create diagrams for my content. I tried asking GPT-4o to generate diagrams for me, but I couldn't find a good result. Here's what I got when asking for a diagram illustrating Merge Sort. The diagram captures the right idea, but all the details are incorrect.

Diagram on GPT-4o image generation - wrong result 

Overall, I feel this is an area where these models still need a lot of improvement.

8. Adding elements to an existing image

Finally, I tried modifying an existing photo by adding elements to it. In this example, I have a photo from inside a tea shop, and I asked it to draw a teacup on the table:

Adding objects to an image with GPT-4o

I had tried to generate this image from scratch using DALL-E before, but each time, the overall look and feel of the image wasn't very realistic. Being able to add elements to a real photograph makes it much easier to get the result I was going for.

Conclusion

In this article, we explored the exciting new features of GPT-4o image generation and its remarkable capabilities. Through eight practical examples, we discovered how this model can create text within images, handle transparency, and maintain character consistency. Each capability illustrated how versatile and effective GPT-4o is in bringing creative visions to life.

I feel it still has a lot of room to improve when it comes to infographics and diagrams. The images it generates in these cases are coherent with the prompts but lack accuracy and factual consistency.

I haven't been this excited about an AI release in a long time. In my opinion, GPT-4o is a true game changer in the field of image generation. I'm thrilled to experiment with it further and already have numerous ideas I can't wait to explore and bring to life.


François Aubry's photo
Author
François Aubry
LinkedIn
Full-stack engineer & founder at CheapGPT. Teaching has always been my passion. From my early days as a student, I eagerly sought out opportunities to tutor and assist other students. This passion led me to pursue a PhD, where I also served as a teaching assistant to support my academic endeavors. During those years, I found immense fulfillment in the traditional classroom setting, fostering connections and facilitating learning. However, with the advent of online learning platforms, I recognized the transformative potential of digital education. In fact, I was actively involved in the development of one such platform at our university. I am deeply committed to integrating traditional teaching principles with innovative digital methodologies. My passion is to create courses that are not only engaging and informative but also accessible to learners in this digital age.
Topics

Learn AI with these courses!

Track

AI Fundamentals

10hrs hr
Discover the fundamentals of AI, dive into models like ChatGPT, and decode generative AI secrets to navigate the dynamic AI landscape.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

blog

GPT-4o Guide: How it Works, Use Cases, Pricing, Benchmarks

Learn about OpenAI’s GPT-4o, a multimodal AI model that processes text, audio, and visual data, and discover how it compares with GPT-4 Turbo for various use cases.
Richie Cotton's photo

Richie Cotton

8 min

blog

What is GPT-4 and Why Does it Matter?

OpenAI has announced the release of its latest large language model, GPT-4. This model is a large multimodal model that can accept both image and text inputs and generate text outputs.
Abid Ali Awan's photo

Abid Ali Awan

9 min

blog

What Is GPT-4o Mini? How It Works, Use Cases, API & More

GPT-4o mini is a smaller, more affordable version of OpenAI's GPT-4o model, offering a balance of performance and cost-efficiency for various AI applications.
Ryan Ong's photo

Ryan Ong

8 min

Tutorial

GPT-4o Vision Fine-Tuning: A Guide With Examples

Learn how to fine-tune GPT-4o with images by following this step-by-step tutorial, where you’ll discover how to improve GPT-4o’s ability to recognize Georgian churches.
Dimitri Didmanidze's photo

Dimitri Didmanidze

8 min

Tutorial

GPT-4 Vision: A Comprehensive Guide for Beginners

This tutorial will introduce you to everything you need to know about GPT-4 Vision, from accessing it to, going hands-on into real-world examples, and the limitations of it.
Arunn Thevapalan's photo

Arunn Thevapalan

12 min

to do list with a robot -- chatgpt tasks

Tutorial

ChatGPT Tasks: A Guide With 10 Practical Examples

Learn what OpenAI's ChatGPT tasks are, how to use them, and discover 10 practical use cases to automate and schedule your actions.
François Aubry's photo

François Aubry

8 min

See MoreSee More