Blog

Stability AI Announces Stable Diffusion 3: All We Know So Far

Find out about the new updates to Stable Diffusion and discover the capabilities of the version 3 text-to-image model.

Updated Feb 2024

Stability AI announced an early preview of Stable Diffusion 3, their text-to-image generative AI model. Unlike last week's Sora text-to-video announcement from OpenAI, there were limited demonstrations of the model's new capabilities, but some details were provided. Here, we explore what the announcement means, how the new model works, and some implications for the advancement of image generation.

What is Stable Diffusion 3?

Stable Diffusion is a series of text-to-image generative AI models. That is, you write a prompt describing what you want to see, and the model creates an image matching your description. There is a web user interface for easy access to the AI.

One major difference to OpenAI's rival DALL·E image generation AI is that it has "open weights". That is, the details of the neural network that provides the computations of the model are publicly available. That means that some transparency in how the model works, and it is possible for researchers to adapt and build on the work of Stability AI.

Stable Diffusion 3 is not one model, but a whole family of models, with sizes ranging from 800 million parameters to 8 billion parameters. More parameters result in a higher quality of output, but have the side-effect that images are more expensive and take longer to create. Versions of the model with fewer parameters are better for creating simple images, and versions with more parameters are better suited to creating higher quality or more complex images.

How does Stable Diffusion 3 work?

Stable Diffusion 3 uses a diffusion transformer architecture, similar to the one used by Sora. Previous versions of Stable Diffusion—and most current image generation AIs—use a diffusion model. Large language models for text generation, like GPT, use a transformer architecture. Being able to combine the two models is a recent innovation and promises to harness the best of both architectures.

Diffusion models perform well at creating detail in small regions but are poor at generating the overall layout of an image. Conversely, transformers are good at layout but poor at creating detail. So it is likely that Stable Diffusion will use a transformer to lay out the overall picture and then use diffusers to generate patches.

That means that we can expect Stable Diffusion 3 to perform better than its predecessors in organizing complex scenes.

The announcement also states that Stable Diffusion 3 uses a technique called flow matching. This is a more computationally efficient way of training models, and creating images from those models, than the current diffusion path technique. That means that the AI is cheaper to create, and images created with the AI are also cheaper to create, resulting in lower costs for the AI.

What are the limitations of Stable Diffusion 3?

One of the current limitations of image generation AI is the ability to generate text. Notably, the Stability AI announcement began with an image that included the name of the model, "Stable Diffusion 3". The positioning of the letters in the text is good but not perfect: notice that the distance between the "B" and the "L" in Stable is wider than the distance between the "L" and the "E". Similarly, the two "F"s in Diffusion are too close together. However, overall, this is a noticeable improvement over the previous generation of models.

Prompt: Epic anime artwork of a wizard atop a mountain at night casting a cosmic spell into the dark sky that says "Stable Diffusion 3" made out of colorful energy

Another issue with the models is that because diffusers generate patches of the image separately, inconsistencies can occur between regions of the image. This is mostly a problem when trying to generate realistic images. The announcement post didn't include many realistic examples, but an image of a bus in a city street reveals a few instances of these problems. Notice that the shadow underneath the bus suggests light coming from behind the bus, but the shadow of a building on the street indicates light coming from the left of the image. Similarly, the positioning of the windows in the building at the top right of the image is slightly inconsistent across different regions of the building. The bus also has no driver, though this may be fixable with more careful prompting.

How can I access Stable Diffusion 3?

Stable Diffusion 3 is in an "early preview" state. That means it is only available to researchers for testing purposes. The preview state is to allow Stability AI to gather feedback about the performance and safety of the model before it is released to the public.

You can join the waiting list for access to the AI here.

What are the use cases of Stable Diffusion 3?

Image generation AIs have already found many use cases, from illustrations to graphic design to marketing materials. Stable Diffusion promises to be useable in the same ways, with the added advantage that it is likely to be able to create images with more complex layouts.

What are the risks of Stable Diffusion 3?

The dataset that Stable Diffusion was trained on included some copyrighted images, which has resulted in several as-yet-unresolved lawsuits. It is unclear what the outcome of these lawsuits will be, but it is theoretically possible that any images created by Stable Diffusion will also be considered in breach of copyright.

What Don't We Know Yet?

The full technical details of Stable Diffusion 3 have not been released yet, and in particular, there is no way to test the performance of the AI. Once the model is publicly available and benchmarks are established, it will be possible to determine how much of an improvement the AI is over previous models. Other factors such as the time and cost to generate an image will also become clear.

One technical development that was heavily championed by OpenAI in their DALL·E 3 paper, but was not mentioned in the Stability AI announcement was recaptioning. This is a form of automatic prompt engineering, where the text written by the user is restructured and given extra detail to provide clearer instructions to the model. It is unknown whether Stable Diffusion 3 makes use of this technique or not.

Closing thoughts

Stable Diffusion 3 promises to be another step forward in the progress of text-to-image generative AI. Once the AI is publicly released, we'll be able to test it further and discover new use cases. If you’re eager to get started in the world of generative AI, our AI Fundamentals skill track will help you get up to speed with machine learning, deep learning, NLP, generative models, and more.

For more resources on the latest in the world of AI, check out the list below:

Author

Richie Cotton

Topics

Artificial Intelligence (AI)

Start Your AI Journey Today!

Course

Introduction to ChatGPT

1 hr

187.9K

Learn how to use ChatGPT. Discover best practices for writing prompts and explore common business use cases for the powerful AI tool.

See Details

Start Course

Track

AI Fundamentals

10hrs hr

Discover the fundamentals of AI, dive into models like ChatGPT, and decode generative AI secrets to navigate the dynamic AI landscape.

See Details

Start Course

Course

ChatGPT Prompt Engineering for Developers

4 hr

5.9K

Dive deep into the principles and best practices of prompt engineering to leverage powerful language models like ChatGPT to solve real-world problems.

See Details

Start Course

You’re invited! Join us for Radar: AI Edition

Join us for two days of events sharing best practices from thought leaders in the AI space

DataCamp Team

2 min

What is Llama 3? The Experts' View on The Next Generation of Open Source LLMs

Discover Meta’s Llama3 model: the latest iteration of one of today's most powerful open-source large language models.

Richie Cotton

5 min

How Walmart Leverages Data & AI with Swati Kirti, Sr Director of Data Science at Walmart

Swati and Richie explore the role of data and AI at Walmart, how Walmart improves customer experience through the use of data, supply chain optimization, demand forecasting, scaling AI solutions, and much more.

Richie Cotton

31 min

Creating an AI-First Culture with Sanjay Srivastava, Chief Digital Strategist at Genpact

Sanjay and Richie cover the shift from experimentation to production seen in the AI space over the past 12 months, how AI automation is revolutionizing business processes at GENPACT, how change management contributes to how we leverage AI tools at work, and much more.

Richie Cotton

36 min

Serving an LLM Application as an API Endpoint using FastAPI in Python

Unlock the power of Large Language Models (LLMs) in your applications with our latest blog on "Serving LLM Application as an API Endpoint Using FastAPI in Python." LLMs like GPT, Claude, and LLaMA are revolutionizing chatbots, content creation, and many more use-cases. Discover how APIs act as crucial bridges, enabling seamless integration of sophisticated language understanding and generation features into your projects.

Moez Ali

How to Improve RAG Performance: 5 Key Techniques with Examples

Explore different approaches to enhance RAG systems: Chunking, Reranking, and Query Transformations.

Eugenia Anello

See More See More

What is Stable Diffusion 3?

How does Stable Diffusion 3 work?

What are the limitations of Stable Diffusion 3?

How can I access Stable Diffusion 3?

What are the use cases of Stable Diffusion 3?

What are the risks of Stable Diffusion 3?

What Don't We Know Yet?

Closing thoughts

You’re invited! Join us for Radar: AI Edition

What is Llama 3? The Experts' View on The Next Generation of Open Source LLMs

How Walmart Leverages Data & AI with Swati Kirti, Sr Director of Data Science at Walmart

Creating an AI-First Culture with Sanjay Srivastava, Chief Digital Strategist at Genpact

Serving an LLM Application as an API Endpoint using FastAPI in Python

How to Improve RAG Performance: 5 Key Techniques with Examples

.css-1531qan{-webkit-text-decoration:none;text-decoration:none;color:inherit;}Introduction to ChatGPT

AI Fundamentals

ChatGPT Prompt Engineering for Developers

You’re invited! Join us for Radar: AI Edition

What is Llama 3? The Experts' View on The Next Generation of Open Source LLMs

How Walmart Leverages Data & AI with Swati Kirti, Sr Director of Data Science at Walmart

Creating an AI-First Culture with Sanjay Srivastava, Chief Digital Strategist at Genpact

Serving an LLM Application as an API Endpoint using FastAPI in Python

How to Improve RAG Performance: 5 Key Techniques with Examples

Introduction to ChatGPT