Skip to main content
HomeBlogArtificial Intelligence (AI)

Stability AI Announces Stable Diffusion 3: All We Know So Far

Find out about the new updates to Stable Diffusion and discover the capabilities of the version 3 text-to-image model.
Feb 2024

Stability AI announced an early preview of Stable Diffusion 3, their text-to-image generative AI model. Unlike last week's Sora text-to-video announcement from OpenAI, there were limited demonstrations of the model's new capabilities, but some details were provided. Here, we explore what the announcement means, how the new model works, and some implications for the advancement of image generation.

What is Stable Diffusion 3?

Stable Diffusion is a series of text-to-image generative AI models. That is, you write a prompt describing what you want to see, and the model creates an image matching your description. There is a web user interface for easy access to the AI.

One major difference to OpenAI's rival DALL·E image generation AI is that it has "open weights". That is, the details of the neural network that provides the computations of the model are publicly available. That means that some transparency in how the model works, and it is possible for researchers to adapt and build on the work of Stability AI.

Stable Diffusion 3 is not one model, but a whole family of models, with sizes ranging from 800 million parameters to 8 billion parameters. More parameters result in a higher quality of output, but have the side-effect that images are more expensive and take longer to create. Versions of the model with fewer parameters are better for creating simple images, and versions with more parameters are better suited to creating higher quality or more complex images.

How does Stable Diffusion 3 work?

Stable Diffusion 3 uses a diffusion transformer architecture, similar to the one used by Sora. Previous versions of Stable Diffusion—and most current image generation AIs—use a diffusion model. Large language models for text generation, like GPT, use a transformer architecture. Being able to combine the two models is a recent innovation and promises to harness the best of both architectures.

Diffusion models perform well at creating detail in small regions but are poor at generating the overall layout of an image. Conversely, transformers are good at layout but poor at creating detail. So it is likely that Stable Diffusion will use a transformer to lay out the overall picture and then use diffusers to generate patches.

That means that we can expect Stable Diffusion 3 to perform better than its predecessors in organizing complex scenes.

The announcement also states that Stable Diffusion 3 uses a technique called flow matching. This is a more computationally efficient way of training models, and creating images from those models, than the current diffusion path technique. That means that the AI is cheaper to create, and images created with the AI are also cheaper to create, resulting in lower costs for the AI.

What are the limitations of Stable Diffusion 3?

One of the current limitations of image generation AI is the ability to generate text. Notably, the Stability AI announcement began with an image that included the name of the model, "Stable Diffusion 3". The positioning of the letters in the text is good but not perfect: notice that the distance between the "B" and the "L" in Stable is wider than the distance between the "L" and the "E". Similarly, the two "F"s in Diffusion are too close together. However, overall, this is a noticeable improvement over the previous generation of models.

Prompt: Epic anime artwork of a wizard atop a mountain at night casting a cosmic spell into the dark sky that says

Prompt: Epic anime artwork of a wizard atop a mountain at night casting a cosmic spell into the dark sky that says "Stable Diffusion 3" made out of colorful energy

Another issue with the models is that because diffusers generate patches of the image separately, inconsistencies can occur between regions of the image. This is mostly a problem when trying to generate realistic images. The announcement post didn't include many realistic examples, but an image of a bus in a city street reveals a few instances of these problems. Notice that the shadow underneath the bus suggests light coming from behind the bus, but the shadow of a building on the street indicates light coming from the left of the image. Similarly, the positioning of the windows in the building at the top right of the image is slightly inconsistent across different regions of the building. The bus also has no driver, though this may be fixable with more careful prompting.

image1.png

How can I access Stable Diffusion 3?

Stable Diffusion 3 is in an "early preview" state. That means it is only available to researchers for testing purposes. The preview state is to allow Stability AI to gather feedback about the performance and safety of the model before it is released to the public.

You can join the waiting list for access to the AI here.

What are the use cases of Stable Diffusion 3?

Image generation AIs have already found many use cases, from illustrations to graphic design to marketing materials. Stable Diffusion promises to be useable in the same ways, with the added advantage that it is likely to be able to create images with more complex layouts.

What are the risks of Stable Diffusion 3?

The dataset that Stable Diffusion was trained on included some copyrighted images, which has resulted in several as-yet-unresolved lawsuits. It is unclear what the outcome of these lawsuits will be, but it is theoretically possible that any images created by Stable Diffusion will also be considered in breach of copyright.

What Don't We Know Yet?

The full technical details of Stable Diffusion 3 have not been released yet, and in particular, there is no way to test the performance of the AI. Once the model is publicly available and benchmarks are established, it will be possible to determine how much of an improvement the AI is over previous models. Other factors such as the time and cost to generate an image will also become clear.

One technical development that was heavily championed by OpenAI in their DALL·E 3 paper, but was not mentioned in the Stability AI announcement was recaptioning. This is a form of automatic prompt engineering, where the text written by the user is restructured and given extra detail to provide clearer instructions to the model. It is unknown whether Stable Diffusion 3 makes use of this technique or not.

Closing thoughts

Stable Diffusion 3 promises to be another step forward in the progress of text-to-image generative AI. Once the AI is publicly released, we'll be able to test it further and discover new use cases. If you’re eager to get started in the world of generative AI, our AI Fundamentals skill track will help you get up to speed with ​​machine learning, deep learning, NLP, generative models, and more.

For more resources on the latest in the world of AI, check out the list below:


Photo of Richie Cotton
Author
Richie Cotton

Richie helps individuals and organizations get better at using data and AI. He's been a data scientist since before it was called data science, and has written two books and created many DataCamp courses on the subject. He is a host of the DataFramed podcast, and runs DataCamp's webinar program.

Topics

Start Your AI Journey Today!

Course

Introduction to ChatGPT

1 hr
190.1K
Learn how to use ChatGPT. Discover best practices for writing prompts and explore common business use cases for the powerful AI tool.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

blog

What is OpenAI's GPT-4o? Launch Date, How it Works, Use Cases & More

Discover OpenAI's GPT-4o and learn about its launch date, unique features, capabilities, cost, and practical use cases.
Richie Cotton's photo

Richie Cotton

6 min

blog

AI Ethics: An Introduction

AI Ethics is the field that studies how to develop and use artificial intelligence in a way that is fair, accountable, transparent, and respects human values.
Vidhi Chugh's photo

Vidhi Chugh

9 min

podcast

The 2nd Wave of Generative AI with Sailesh Ramakrishnan & Madhu Iyer, Managing Partners at Rocketship.vc

Richie, Madhu and Sailesh explore the generative AI revolution, the impact of genAI across industries, investment philosophy and data-driven decision-making, the challenges and opportunities when investing in AI, future trends and predictions, and much more.
Richie Cotton's photo

Richie Cotton

51 min

podcast

The Venture Mindset with Ilya Strebulaev, Economist Professor at Stanford Graduate School of Business

Richie and Ilya explore the venture mindset, the importance of embracing unknowns, how VCs deal with unpredictability, how our education affects our decision-making ability, venture mindset principles and much more. 
Richie Cotton's photo

Richie Cotton

59 min

tutorial

Phi-3 Tutorial: Hands-On With Microsoft’s Smallest AI Model

A complete guide to exploring Microsoft’s Phi-3 language model, its architecture, features, and application, along with the process of installation, setup, integration, optimization, and fine-tuning the model.
Zoumana Keita 's photo

Zoumana Keita

14 min

tutorial

How to Use the Stable Diffusion 3 API

Learn how to use the Stable Diffusion 3 API for image generation with practical steps and insights on new features and enhancements.
Kurtis Pykes 's photo

Kurtis Pykes

12 min

See MoreSee More