Track
As a photographer and someone interested in art in general, I'm always intrigued when a new image-generation model comes out. OpenAI’s GPT-4o image generation truly blew me away.
I have ideas in my mind that I’d like to express visually, but sometimes I find it hard to bring them to life. I keep hoping a model will come along that can bridge the gap between reality and my vision. The new model might just be that bridge.
In this article, I’ll showcase the capabilities of OpenAI's new image generation model through 8 practical examples. Additionally, we've also created a video with three extra real-world use cases:
What Is GPT-4o Image Generation?
GPT-4o image generation is a new feature in the GPT-4o model that allows users to create images directly within ChatGPT. This feature brings native image generation to the platform, making it accessible for various purposes like creativity, education, and more.
The launch represents a big leap forward from prior image generation technologies, as it aims to make the creation of images more accurate, user-friendly, and useful across many situations. For instance, users can now generate images by providing specific prompts, blending images with text, or even editing images through simple instructions.
Overall, GPT-4o image generation can be used for various creative tasks, such as making comics, designing trading cards, crafting memes, or even creating educational materials that explain complex topics. For instance, I prompted ChatGPT to summarize the content of this section through an infographic:

Example infographic using GPT-4o image generation
How to Access GPT-4o Image Generation?
The GPT-4o image generation feature is available as the default image generator in ChatGPT. According to OpenAI, it is available for Plus, Pro, Team, and Free users. However, in my experience, I couldn't get it to work on my Free plan, and later OpenAI confirmed that access is not yet available on the Free plan because of the high demand.
Developers will have the opportunity to generate images with GPT-4o through the API in the coming weeks.
You can create images with GPT-4o by selecting the GPT-4o model and providing a text prompt describing what you want it to generate.

We can also keep chatting to request changes:

GPT-4o Image Generation Examples
Now that we've covered how to use the model, let's demonstrate what it can do through eight practical examples.
OpenAI claims that this new model doesn't just generate pretty images. It is able to generate images that are actually useful in the real world. In my opinion, for an image generation model to be truly useful, it must be able to modify existing images or apply existing styles consistently.
In real-life situations, we usually don't want an image from scratch. Rather, we have a style and want to generate an image in that style, or we have a photo and need to modify it in some way. Here are a few examples:
- A coffee shop owner wanting to post a marketing photo doesn't want an image of a random coffee shop—they want a photo of their coffee shop.
- If I am using AI to create a visual story, I need to be able to keep a consistent character throughout the story. It's of no use if the images aren't consistent.
- As a photographer, I have no interest in generating an image from scratch that doesn't exist in real life. Rather, I want to be able to edit an existing photograph.
1. Text
We already saw in the logo example that GPT-4o can generate text in images. Generating stand-alone text is probably the easiest example.
To test this further, I tried generating text on an object:

This example showcases two important features:
- The model is able to generate text about an object in a way that is consistent with the shape of the object.
- The model can understand colors and follow a color scheme.
To push the model further, I asked it to generate longer text and display it in the image in a readable way. Here is the result:

I was impressed by this. Other models I've tried in the past have not performed this task so well.
2. Transparency
GPT-4o is able to generate images with transparent areas. This is especially useful for images that are meant to be overlaid on top of other content, like stickers of characters from a game.
I took a photo of myself and asked GPT-4o to create a pixel art character based on it. Here's the result:

Note that it didn't generate a transparent background by default, but asking for it worked well and didn't alter the original result.
3. Character consistency
Based on the previous conversation, I tried to generate a scene using the pixel art character I had generated. This was the result:

The character in this image has a different resolution than the original one. It has more details, so it seems that GPT-4o generates a new one based on the photo rather than using the character it created before.
It's still a nice result, but it's not usable as is in a game because we need the two characters to be more consistent. At this stage, it's better as inspiration for a pixel artist rather than an end result in itself.
4. Creating a detailed story
Next, I wanted to create a comic book strip to tell the story of how I took a cityscape photo of Taipei a few months back. I used this to test how GPT-4o handles generating an image from detailed instructions.
I started by asking the model to generate a comic book character based on myself. Then, I provided the details of each frame in the comic book strip.

The first result was close to what I wanted, but not fully accurate. Also, I felt again that the model generated a new character rather than using the first one it generated.
However, I was very pleased with the result after my changes were requested. It was an interesting feeling to see that night come to life as a comic book strip.

I particularly loved that it was able to mimic the photo in the last frame. I think it elevated the result.
5. Photo editing
Next, I tried photo editing. A few months ago, I was traveling back to Europe, and I took a photo before boarding the plane. Unfortunately, there was an annoying reflection on the window because I took the photo from the inside. I tried using Photoshop to remove it but didn't succeed.
I tried again using GPT-4o, and it worked really well.

Here are a few other examples of editing a photo using GPT-4o:

Again, it's not perfect but still pretty good. In the first example, the people were removed but the building in the back got modified. The night photos are nice but a little bit too dark.
Another interesting detail is that due to the conversational aspect of GPT-4o, it tends to apply the new changes to the latest image. In this case, when I requested the rain, I was expecting it to modify the original image, not the night image.
We can get around this by specifying the image in the prompt or starting a new conversation.
6. Color grading
Most of my photo editing consists of adjusting the colors, not modifying the content of the photo.
I was curious to see how good GPT-4o was at color grading, so I experimented with color grading in one of my photos. One of my favorite movies is Blade Runner 2049, and I like the overall aesthetic of the movie, so I wanted to see if GPT-4o could color-grade one of my urban photos in that style. Here's the result:

I loved the result. It saved me so much time compared to editing it myself. I also really enjoy the fact that it (mostly) preserved the integrity of the image.
In this example, we describe the desired result textually. I also tried to give it a sample image with a color palette to see if it could color-grade my photo in that style. In my opinion, it did a very good job at it.

7. Infographics and diagrams
An infographic is a visual representation of information or data designed to make complex ideas easier to understand quickly. So far, I haven't seen a model that can produce useful infographics.
Let's put GPT-4o to the test by asking it to generate an infographic explaining why there are so many earthquakes in Taiwan.

The first result was quite inaccurate, as both the location and spelling of Taiwan are incorrect. I asked it to fix it and got a better result. However, the new result is still not perfect because the end of the explanation is cut off.
This shows the model isn't perfect yet. However, I've seen a lot of examples online where it did pretty well at this task.
As an online educator, I often need to create diagrams for my content. I tried asking GPT-4o to generate diagrams for me, but I couldn't find a good result. Here's what I got when asking for a diagram illustrating Merge Sort. The diagram captures the right idea, but all the details are incorrect.
Overall, I feel this is an area where these models still need a lot of improvement.
8. Adding elements to an existing image
Finally, I tried modifying an existing photo by adding elements to it. In this example, I have a photo from inside a tea shop, and I asked it to draw a teacup on the table:

I had tried to generate this image from scratch using DALL-E before, but each time, the overall look and feel of the image wasn't very realistic. Being able to add elements to a real photograph makes it much easier to get the result I was going for.
Conclusion
In this article, we explored the exciting new features of GPT-4o image generation and its remarkable capabilities. Through eight practical examples, we discovered how this model can create text within images, handle transparency, and maintain character consistency. Each capability illustrated how versatile and effective GPT-4o is in bringing creative visions to life.
I feel it still has a lot of room to improve when it comes to infographics and diagrams. The images it generates in these cases are coherent with the prompts but lack accuracy and factual consistency.
I haven't been this excited about an AI release in a long time. In my opinion, GPT-4o is a true game changer in the field of image generation. I'm thrilled to experiment with it further and already have numerous ideas I can't wait to explore and bring to life.




