Track
How to Use the Stable Diffusion 3 API
Stability AI announced an early preview of Stable Diffusion 3 in February 2024. The AI model is still in preview, but in April 2024, the team announced they would make Stable Diffusion 3 and Stable Diffusion 3 Turbo available on the Stability AI Developer Platform API after partnering with Fireworks AI, the fastest and most reliable API platform in the market.
Note that Stable Diffusion 3 is simply a series of text-to-image generative AI models. According to the team at Stability AI, the model is “equal to or outperforms” other text-to-image generators, such as OpenAI’s DALL-E 3 and Midjourney v6, in “typography and prompt adherence.”
In this tutorial, you will learn practical steps to get started with the API so you can start generating your own images.
Why Stable Diffusion 3?
Stable Diffusion 3 introduces several advancements and features that set it apart from its predecessors and make it highly competitive in the text-to-image generation space – particularly in terms of improved text generation and prompt-following capabilities.
Let's explore these advancements:
Enhanced prompt following
- Contextual understanding: Stable Diffusion 3 incorporates state-of-the-art natural language processing (NLP) techniques, allowing it to understand better and interpret user prompts. This enables more accurate and contextually relevant responses to user inputs.
- Prompt continuity: Unlike previous versions, Stable Diffusion 3 maintains better continuity in following prompts, ensuring that generated text remains coherent and aligned with the user's input throughout the conversation.
Improved text generation
- Fine-tuned language models: Stable Diffusion 3 utilizes fine-tuned language models that have undergone extensive training on large datasets, resulting in enhanced text generation capabilities. These models better understand grammar, syntax, and semantics, leading to more coherent and natural-sounding responses.
- Reduced response variability: Through improved training methodologies and model architectures, Stable Diffusion 3 reduces response variability, generating more consistent and high-quality outputs across different prompts and contexts.
Advanced prompt expansion
- Multi-turn dialogue support: Stable Diffusion 3 can handle multi-turn dialogues more effectively, maintaining coherence and context across multiple exchanges between the user and the AI model.
- Prompt expansion techniques: The model employs advanced prompt expansion techniques to generate more informative and contextually relevant responses, enriching the dialogue and providing users with comprehensive answers to their queries.
Fine-tuned control mechanisms
- Parameter tuning: Stable Diffusion 3 features fine-tuned control mechanisms that allow users to adjust parameters such as response length, creativity level, and topic relevance, enabling a more customized interaction experience.
- Bias mitigation: The model incorporates measures to mitigate biases in text generation, promoting fairness and inclusivity in its responses to user prompts.
Getting Started With Stable Diffusion 3 API
This section will go through the steps to start with the Stability API.
Step 1: Create your account. You'll need to create an account before you can use Stability AI’s API. You can sign up using a username and password, but new users get 25 free credits for signing up using their Google account.
Step 2: Claim your API key. Once you’ve created your account, you’ll need an API get. This can be found on the API Keys page. In the documentation, Stability AI states that “All APIs documented on this site use the same authentication mechanism: passing the API key in via the Authorization header.”
Step 3: Topping up credits. You must have credits to request the API. Credits are the unit of currency consumed when calling the API – the amount consumed varies across models and modalities. After using up all your credits, you can purchase more through your Billing dashboard at $1 USD per 100 credits.
In this tutorial, we will use Google Colab and ComfyUI to demonstrate how to generate images using the Stable Diffusion 3 API. In the next section, we will cover the steps to get started using each tool.
Using the Stable Diffusion 3 API with Google Colab
To get started with Google Colab, you must create a Google account – click the link and follow the instructions.
If you already have a Google account, open a new notebook and follow the steps below.
Note: The code used in this example is taken from the SD3_API tutorial by Stability AI.
Step 1: Install the requirements.
from io import BytesIO
import IPython
import json
import os
from PIL import Image
import requests
import time
from google.colab import output
Step 2: Connect to the Stability API.
import getpass
# To get your API key, visit https://platform.stability.ai/account/keys
STABILITY_KEY = getpass.getpass('Enter your API Key')
Step 3. Define functions
def send_generation_request(
host,
params,
):
headers = {
"Accept": "image/*",
"Authorization": f"Bearer {STABILITY_KEY}"
}
# Encode parameters
files = {}
image = params.pop("image", None)
mask = params.pop("mask", None)
if image is not None and image != '':
files["image"] = open(image, 'rb')
if mask is not None and mask != '':
files["mask"] = open(mask, 'rb')
if len(files)==0:
files["none"] = ''
# Send request
print(f"Sending REST request to {host}...")
response = requests.post(
host,
headers=headers,
files=files,
data=params
)
if not response.ok:
raise Exception(f"HTTP {response.status_code}: {response.text}")
return response
Step 4. Generate images.
According to the documentation, the Stable Image services include only one offering that’s currently in production:
- SD3: uses 6.5 credits
- SD3 Turbo: uses 4 credits
Let’s test them out.
In this example, we will create an image of a Toucan bird in a lowland tropic area.
# SD3
prompt = "This dreamlike digital art captures a vibrant, Toucan bird in a lowland tropic area" #@param {type:"string"}
negative_prompt = "" #@param {type:"string"}
aspect_ratio = "1:1" #@param ["21:9", "16:9", "3:2", "5:4", "1:1", "4:5", "2:3", "9:16", "9:21"]
seed = 0 #@param {type:"integer"}
output_format = "jpeg" #@param ["jpeg", "png"]
host = f"https://api.stability.ai/v2beta/stable-image/generate/sd3"
params = {
"prompt" : prompt,
"negative_prompt" : negative_prompt,
"aspect_ratio" : aspect_ratio,
"seed" : seed,
"output_format" : output_format,
"model" : "sd3",
"mode" : "text-to-image"
}
response = send_generation_request(
host,
params
)
# Decode response
output_image = response.content
finish_reason = response.headers.get("finish-reason")
seed = response.headers.get("seed")
# Check for NSFW classification
if finish_reason == 'CONTENT_FILTERED':
raise Warning("Generation failed NSFW classifier")
# Save and display result
generated = f"generated_{seed}.{output_format}"
with open(generated, "wb") as f:
f.write(output_image)
print(f"Saved image {generated}")
output.no_vertical_scroll()
print("Result image:")
IPython.display.display(Image.open(generated))
Here’s what it created:
Image created by author using Stable Diffusion 3
Now, let’s create an image of a car made out of fruits using SD3 Turbo:
#SD3 Turbo
prompt = "A car made out of fruits." #@param {type:"string"}
aspect_ratio = "1:1" #@param ["21:9", "16:9", "3:2", "5:4", "1:1", "4:5", "2:3", "9:16", "9:21"]
seed = 0 #@param {type:"integer"}
output_format = "jpeg" #@param ["jpeg", "png"]
host = f"https://api.stability.ai/v2beta/stable-image/generate/sd3"
params = {
"prompt" : prompt,
"aspect_ratio" : aspect_ratio,
"seed" : seed,
"output_format" : output_format,
"model" : "sd3-turbo"
}
response = send_generation_request(
host,
params
)
# Decode response
output_image = response.content
finish_reason = response.headers.get("finish-reason")
seed = response.headers.get("seed")
# Check for NSFW classification
if finish_reason == 'CONTENT_FILTERED':
raise Warning("Generation failed NSFW classifier")
# Save and display result
generated = f"generated_{seed}.{output_format}"
with open(generated, "wb") as f:
f.write(output_image)
print(f"Saved image {generated}")
output.no_vertical_scroll()
print("Result image:")
IPython.display.display(Image.open(generated))
Running this code produced the following image:
Image created by author using Stable Diffusion 3 Turbo
Using the API with ComfyUI
ComfyUI is a robust and flexible graphical user interface (GUI) for stable diffusion. It features a graph-based interface and uses a flowchart-style design to enable users to create and run sophisticated, stable diffusion workflows.
System requirements:
- Graphics Processing Unit (GPU): An adequate NVIDIA GPU with a minimum of 8GB of VRAM, such as the RTX 3060 Ti or better.
- Central Processing Unit (CPU): A contemporary processor, including Intel Xeon E5, i5, Ryzen 5, or higher.
- Random Access Memory (RAM): 16GB or greater.
- Operating System: Windows 10/11 or Linux.
- Adequate storage space on your computer for models and generated images.
Step 1: Install ComfyUI
The simplest method for installing ComfyUI on Windows involves utilizing the standalone installer found on the releases page. This installer includes essential dependencies such as PyTorch and Hugging Face Transformers, eliminating the need for separate installations.
It provides a comprehensive package, enabling a swift setup of ComfyUI on Windows without requiring intricate configurations.
Simply download, extract, add models, and launch!
Step 1.1: Download the standalone version of ComfyUI from this GitHub repository – clicking the link will initiate the download.
Step 1.2: Once you've downloaded the most recent comfyui-windows.zip file, extract it using a utility such as 7-Zip or WinRAR.
Step 1.3: A checkpoint model is required to start using ComfyUI. You can download a checkpoint model from Stable Diffusion or Hugging Face . Put the model in the folder:
ComfyUI_windows_portable\ComfyUI\models\checkpoints
Step 1.4: Now, simply run the run_nvidia_gpu.bat (recommended) or run_cpu.bat. This should automatically start ComfyUI on your browser.
The command line will execute and generate a URL http://127.0.0.1:8188/
that you can now open in your browser.
Step 2: Install ComfyUI Manager
Within the File Explorer application, locate the directory you just installed. Given you’re using Windows, it should be named “ComfyUI_windows_portable.
” From here, navigate to ComfyUI
, and then custom_nodes
. From this location, type cmd
in the address bar and press Enter.
This should open up a command prompt terminal, where you must insert the following command:
git clone https://github.com/ltdrdata/ComfyUI-Manager
Once it’s complete, restart ComfyUI. The new “Manager” button should appear on the floating panel.
Step 3: Install the Stability AI API node
Select the Manage button and navigate to “Install Custom Nodes.” From here, search “stability API.”
Locate the "Stability API nodes for ComfyUI" node, then click the Install button situated on the right side to initiate the installation process. Following this, a “Restart” button will become visible. Click on “Restart” to reboot ComfyUI.
Step 4: Define the system-wide API key
This step is optional, but it’s recommended. Namely, You can set a Stability AI API key for each node within the Stability AI custom node. This prevents the need to input the API key repeatedly in every workflow and reduces the risk of inadvertently sharing your API key when sharing your workflow JSON file.
To do so, navigate to the custom node directory:
ComfyUI_windows_portable > ComfyUI > custom_nodes > ComfyUI-SAI_API.
Create a new file named sai_platform_key.txt.
Paste your API Key into the file, save the document, and then restart ComfyUI.
Step 5: Load and run the workflow
Install the Stable Diffusion 3 text-to-image workflow and drop it into ComfyUI.
You’re now good to go!
Troubleshooting and Tips
As with any tool, there’s always a chance you’ll encounter a few issues along the way. Here are the most common challenges and troubleshooting steps for users facing issues with the API or the setup process.
API Key and authentication issues
Challenge: Users may face authentication errors when accessing the API due to an incorrect API key or wrong authentication credentials.
Troubleshooting: Double-check the API key and ensure it is copied and pasted correctly. Verify that there are no extra spaces or characters in the key. Ensure that the API key is properly authenticated by the Stable Diffusion 3 server.
Credit management problems
Challenge: Users may encounter issues related to credit management, such as insufficient credits or billing errors.
Troubleshooting: Check your credit balance in the Stable Diffusion 3 dashboard to ensure that you have sufficient credits. Verify your billing information and address any billing errors or discrepancies with the support team.
Connectivity and network problems
Challenge: Users may experience connectivity issues or network interruptions that prevent them from accessing the API.
Troubleshooting: Ensure that you have a stable internet connection and that there are no network disruptions. To isolate the issue, try accessing the API from a different network or device. Contact your internet service provider if you continue to experience connectivity problems.
Compatibility and dependency errors
Challenge: Users may encounter compatibility issues or dependency errors when installing or using the required tools and libraries.
Troubleshooting: Check the compatibility requirements of the Stable Diffusion 3 API and ensure that you are using compatible versions of tools and libraries. Update or reinstall any dependencies that are causing errors. Refer to the documentation and community forums for troubleshooting guidance.
Performance and response time
Challenge: Users may experience slow response times or performance issues when interacting with the API, particularly during peak usage times.
Troubleshooting: Monitor the API's performance and track response times to identify patterns or trends. Consider upgrading to a higher-tier subscription plan for better performance and priority access. Contact the support team if you consistently experience slow response times.
Documentation and support
Challenge: Users may encounter difficulties understanding the API documentation or require assistance troubleshooting specific issues.
Troubleshooting: For guidance on API usage, troubleshooting, and best practices, refer to the Stable Diffusion 3 documentation. If you have any unresolved issues or questions, contact the support team or community forums.
Conclusion
Stable Diffusion 3 is a series of text-to-image generative AI models. This article covered practical steps to start using the API with Google Colab and ComfyUI. Now, you have the skills to create your own images; be sure to apply what you learned as soon as possible so you do not forget.
Thanks for reading!
Further learning
FAQs
What are some best practices for using Stable Diffusion 3 API effectively?
Best practices for using the Stable Diffusion 3 API include providing clear and specific prompts, experimenting with different parameters to achieve desired results, monitoring credit usage to avoid depletion, and staying updated with the latest documentation and features.
What is Stable Diffusion 3?
Stable Diffusion comprises a collection of AI models focused on generating images from textual prompts. Users provide descriptions of desired images, and the model generates corresponding visual representations based on these prompts.
How does Stable Diffusion work?
Stable Diffusion 3 employs a diffusion transformer architecture akin to Sora, diverging from prior versions that utilized a diffusion model akin to most existing image generation AIs. This innovation merges the transformer architecture commonly used in large language models such as GPT with diffusion models, offering the potential to leverage the strengths of both architectures.
Continue Your AI Journey Today!
Course
Large Language Models (LLMs) Concepts
Course
Developing LLM Applications with LangChain
blog
Stability AI Announces Stable Diffusion 3: All We Know So Far
tutorial
StableDiffusion Web UI: A Comprehensive User Guide for Beginners
tutorial
How to Run Stable Diffusion: A Step-by-Step Guide
tutorial
Fine-tuning Stable Diffusion XL with DreamBooth and LoRA
tutorial
A Comprehensive Guide to the DALL-E 3 API
code-along