Leerpad
In this tutorial, I will demonstrate how to use the BytePlus Ark API to interact with Seedance 2 using Python.
Seedance 2 is a powerful generative AI model from ByteDance that functions as a "multimodal director," capable of processing text, images, video, and audio inputs simultaneously to create cinematic video sequences with synchronized audio. Whether you are looking to animate static images or bring complex text prompts to life, this guide will walk you through the setup and implementation.
Generating a Seedance 2 API Key
To get started, we need to create a BytePlus account and an API key. To do so:
- Go to BytePlus’s official website and create an account.
- Go to the API key creation page and click “Create API Key”.
This key is used to make requests to the ModelArk API, their official API.
The best practice is to store the key in a file named .env in the same folder where we write our Python scripts. Make sure to keep the key secret because anyone can use it to interact with the API using your account.
Paste the API key into the .env file with the following format:
ARK_API_KEY=<paste_api_key_here>
Activating the Seedance 2 model and pricing structure
To be able to use Seedance 2, we first need to activate the model:

Note that using the API isn't free and activating Seedance 2 requires purchasing credits for the model. For this article, I purchased a 7 million token pack for Seedance 2 for $30.10.

Unfortunately, the tokens are model-specific. This means that if we buy tokens for the fast or mini versions of Seedance 2, we can’t use them with the base model.
Below is a table with the pricing plans for the API:

How to Generate Videos With The Seedance 2 API In Python
I’ve included the complete scripts corresponding to the code created in this tutorial in this GitHub repository.
In this section, we learn how to make a text-to-video request to generate this video:
1. Setting up your Python environment and SDK
To connect with the BytePlus Ark API using Python, we'll use the official byteplus-python-sdk-v2 package.
The easier way to set up the dependencies for this project is to create an Anaconda environment and install the requirements.txt file, like so:
git clone git@github.com:fran-aubry/seedance2-tutorial.git
conda create --name seedance2 python=3.11
conda activate seedance2
pip install -r requirements.txt
2. Initializing the BytePlus Ark Python client
Create a new script called text_to_video.py in the same folder as the .env file we created before.
First, we need to import the necessary packages:
import os
from byteplussdkarkruntime import Ark
from dotenv import load_dotenv
Next, we load the API key from .env file using the load_dotenv() function:
load_dotenv()
API_KEY = os.getenv("ARK_API_KEY")
Now we initialize the Ark client, which allows us to make requests to the BytePlus Ark API.
We use the os library to get the API key from the environment variables and set the appropriate base URL for international API access:
client = Ark(
api_key=os.getenv("ARK_API_KEY"),
base_url="https://ark.ap-southeast.bytepluses.com/api/v3"
)
Finally, we set a prompt and use the content_generation.tasks.create() function from the client to generate the video:
prompt = """
A cinematic close-up of a weary but hopeful female astronaut inside a dimly lit, dusty spaceship cabin.
Soft blue light illuminates her face.
She takes a deep breath, showing subtle emotional relief, looks directly into the camera, and says:
'We finally made it. The atmosphere is stable.'
The ambient sound of a low, rhythmic spaceship engine hums in the background.
"""
response = client.content_generation.tasks.create(
model="dreamina-seedance-2-0-260128",
content=[
{
"type": "text",
"text": prompt,
}
],
generate_audio=True,
ratio="16:9",
duration=8,
watermark=True,
resolution="480p",
)
task_id = response.id
print(f"Task successfully submitted! Task ID: {task_id}")
After requesting to generate a video, we don't get a video immediately because the video takes some time to generate.
This means that the response variable isn't the video itself.
Instead, it’s an object that contains information about the video generation task.
In particular, it contains the task identifier, which lets us track the progress of the video generation and download the video once the model is done generating it.
3. Tracking video generation status (Task polling)
We can retrieve the status of a task by providing the task identifier to the content_generation.tasks.get() function, like so:
task = client.content_generation.tasks.get(task_id=task_id)
status = task.status
print(f"Status: {status}")
To make it easier to track video progress, we create a new script utils.py in the same folder and add this function that tracks the video progress given the client instance and task identifier:
import time
import requests
def poll_task(client, task_id, poll_interval=5):
while True:
try:
task_status = client.content_generation.tasks.get(task_id=task_id)
status = getattr(task_status, "status", None) or (task_status.get("status") if isinstance(task_status, dict) else None)
if status == "succeeded":
print("\nTask completed successfully.")
return task_status.content.video_url
elif status == "failed":
error_details = getattr(task_status, "error", "Unknown error occurred during processing.")
raise RuntimeError(f"Task failed: {error_details}")
print(".", end="", flush=True)
time.sleep(poll_interval)
except Exception as e:
if isinstance(e, RuntimeError):
raise
print(f"\nWarning: Error polling task (will retry): {e}")
time.sleep(poll_interval)
4. Downloading the generated video with Python requests
The poll_task() function we implemented above returns the video
Now that we can track the video generation process, all we need is a way to download it when it finishes generating.
Since the BytePlus Ark API returns a direct URL to the generated video file within the task response, we can download it using the requests library.
Here's a function that does this:
def download_video(url, output_path = "video.mp4"):
try:
print(f"Downloading video from {url} to {output_path}...")
response = requests.get(url, stream=True)
response.raise_for_status()
with open(output_path, 'wb') as f:
for chunk in response.iter_content(chunk_size=8192):
f.write(chunk)
print(f"Video successfully downloaded to '{output_path}'")
return True
except Exception as e:
print(f"Error downloading video: {e}")
return False
Complete Python script: Text-to-video with Seedance 2
Putting all of these together, here’s a full script for generating a video from a text prompt using Seedance 2:
import os
from byteplussdkarkruntime import Ark
from dotenv import load_dotenv
import utils
load_dotenv()
API_KEY = os.getenv("ARK_API_KEY")
client = Ark(
api_key=os.getenv("ARK_API_KEY"),
base_url="https://ark.ap-southeast.bytepluses.com/api/v3"
)
prompt = """
A cinematic close-up of a weary but hopeful female astronaut inside a dimly lit, dusty spaceship cabin.
Soft blue light illuminates her face.
She takes a deep breath, showing subtle emotional relief, looks directly into the camera, and says:
'We finally made it. The atmosphere is stable.'
The ambient sound of a low, rhythmic spaceship engine hums in the background.
"""
response = client.content_generation.tasks.create(
model="dreamina-seedance-2-0-260128",
content=[
{
"type": "text",
"text": prompt,
}
],
generate_audio=True,
ratio="16:9",
duration=8,
watermark=True,
resolution="480p",
)
task_id = response.id
print(f"Task successfully submitted! Task ID: {task_id}")
# Wait for the video to finish generating
video_url = utils.poll_task(client, task_id)
utils.download_video(video_url, "./videos/space.mp4")
Seedance 2 API core parameters reference guide
Below is a list of accepted values for the core parameters of a video generation request:

Advanced Guide: Seedance 2.0 Image-to-Video Capabilities
In the previous section, we learned how to generate a video using a text prompt. Seedance 2 also supports image, video, and audio inputs, so let’s start with images.
Using reference images for character consistency
To test image inputs, I generated two characters using AI to use in a sitcom-like scene.

We can provide image references to the model by adding them to the content field of the request, like so:
response = client.content_generation.tasks.create(
model="dreamina-seedance-2-0-260128",
content=[
{
"type": "text",
"text": prompt,
},
{
"type": "image_url",
"image_url": {
"url": utils.load_image("./images/male-character.png")
},
"role": "reference_image"
},
{
"type": "image_url",
"image_url": {
"url": utils.load_image("./images/female-character.png")
},
"role": "reference_image"
}
],
generate_audio=True,
ratio="16:9",
duration=15,
watermark=True,
resolution="480p",
)
For this video, I used the following prompt:
Context: A cozy, warmly lit independent coffee shop counter with wooden accents.
Reference:
- The male barista's face and clothing follow @Image1.
- The female customer's face and styling follow @Image2.
Action & Dialogue (10 Seconds Total):
0:00 - 0:04:
Medium close-up on the male barista (@Image1) standing behind the counter,
next to the coffee machine. He leans across the counter with a smooth,
flirtatious smirk, acting like he is making a highly exclusive offer:
"The WiFi password? It's actually my phone number: five-five-five, zero-one—"
0:04 - 0:07:
Camera cuts to the female customer (@Image2), instantly interrupting him.
She holds up her phone screen toward him, entirely unimpressed:
"Your network is called 'Free_Guest_WiFi' and there is literally no password."
0:07 - 0:08: Quick cut back to the two-shot.
Flustered, the barista becomes visibly embarrassed, his confident smirk
instantly vanishing as he nervously breaks eye contact.
0:08 - 0:10: Camera cuts back to a medium shot of the female customer (@Image2).
She lowers her phone, tapping her fingers lightly on the counter,
and delivers her order with perfect deadpan clarity:
"It will be an espresso please. No milk, no sugar, and no phone number."
Framing/Timing:
Clean cuts to perfectly pace the comedic timing over exactly 10 seconds.
Tone/Audio:
Lighthearted sitcom vibe. Synchronized voice generation for the dialogue, a subtle low cafe background hum, and the sound of the customer's fingers lightly tapping the wooden counter at the end.
Note that the prompt uses @Image1 and @Image2 to refer to the images in the order they were provided in the list.
Unfortunately, when I tried to generate this video, I got an error saying:
The request failed because the input image may contain a real person.
I noticed this consistently as I tried to use Seedance 2 with a reference image containing a realistic human, even with AI-generated ones.
To overcome this, I converted the character into cartoons and tried again.
This time it worked perfectly, and I think the result was exactly what I wanted to create.
The full code for this example can be found in the image_to_video.py file.
Setting first and last frames for guided animation
The previous example showed how to use images as references by setting ”role”: “reference_image” in the request.
Seedance 2 supports two other image roles:
”first_frame”: Instructs the model to use the provided image as the exact starting frame of the generated video. This is the standard setting for direct Image-to-Video generation.”last_frame”: Instructs the model to end the generated video on this specific image. Note: You can only use this if you are also providing afirst_framein the same request, effectively asking the model to interpolate the video between the two images.
I tried this by generating a succession of initial frames showing a robot AI lab trying to recreate plants in a world where natural life has disappeared.

For each frame, I generated a video using that frame as the first frame. Here is how to structure each request:
response = client.content_generation.tasks.create(
model="dreamina-seedance-2-0-260128",
content=[
{
"type": "text",
"text": prompt,
},
{
"type": "image_url",
"image_url": {
"url": utils.load_image("./images/robot-lab1.jpeg")
},
"role": "first_frame"
}
],
generate_audio=False,
ratio="16:9",
duration=4,
watermark=True,
resolution="480p",
)
I generated the videos without audio because the audio wouldn’t be consistent, since each video was generated separately. Here’s the result of putting the five clips together:
The full script used to generate a video based on a starting frame can be found here.
Image interpolation with last_frame
In this example, we provide both a first_frame and last_frame to generate a video between the two.
I generated two images of the same room, one empty without furniture and the other with the room fully furnished. The goal is to generate a video that animates the room being furnished.

The prompt I used described how I want the furniture to appear in the video:
A dynamic, fast-paced sequence where furniture pops into existence piece by piece. First, a rug rapidly unfolds itself, then the sofa drops from just above the floor with a heavy, realistic bounce. Lamps, artwork, and plants quickly spring into reality one after another. Vibrant lighting, photorealistic materials, lively and snappy energetic rhythm.
Here’s the request:
response = client.content_generation.tasks.create(
model="dreamina-seedance-2-0-260128",
content=[
{
"type": "text",
"text": prompt,
},
{
"type": "image_url",
"image_url": {
"url": utils.load_image("./images/room-start.png")
},
"role": "first_frame"
},
{
"type": "image_url",
"image_url": {
"url": utils.load_image("./images/room-end.png")
},
"role": "last_frame"
}
],
generate_audio=True,
ratio="16:9",
duration=8,
watermark=True,
resolution="480p",
)
And here’s the result:
The full script can be found here.
Seedance 2.0 Multimodal Inputs: Adding Audio and Video References
As a multimodal model, Seedance 2 goes beyond text and image inputs. It also supports video and audio inputs. These can be used as stylistic references, movement driving, or video editing.
This example from their official website illustrates this very well. They took a scene of two actors fighting and used that to drive the action, replacing the actors with reference images:
Submitting a video input
As with images, video inputs are provided in the content parameter of the request.
However, as of this writing, video inputs only support URLs. This means that if we want to use a video we have locally on our machine, we’ll need to host it first.
For this example, I used this free-to-use skating video by Marc Espejo. The idea was to use Seedance 2 to add some VFX. Here’s the result:
To generate this, I used the following request:
response = client.content_generation.tasks.create(
model="dreamina-seedance-2-0-260128",
content=[
{
"type": "text",
"text": prompt,
},
{
"type": "video_url",
"video_url": {
"url": "https://www.pexels.com/download/video/38213009/"
},
"role": "reference_video"
}
],
generate_audio=True,
ratio="16:9",
duration=6,
watermark=True,
resolution="480p",
)
The full script can be found here. The prompt I used was:
The original skater, environment, and camera movement must remain
unchanged and untouched.
Only add gaming-style visual effects: intense glowing sparks and electric arcs
streaming from the skateboard-rail contact point during the grind, and a
brilliant energy shockwaves and dust burst at the jump/pop.
Submitting an audio input
To test audio inputs, I generated an image for a character and used Fish Audio to generate an audio track with different emotions.

For the audio, I wanted a few different emotions to see if Seedance 2 was able to identify those when generating the video.
Fish Audio AI let us specify emotions using [emotion] in the prompt, so it was perfect for the job. Here’s the audio prompt I used:
[sighing] Nothing much is happening today. [surprised] Wait a minute... what are those on my ears?[laughing] Oh my gosh, I can't believe I got piercings last night! [angry] But hold on, I specifically told you not to let me do it!
To load the audio file, I wrote this function that I added to utils.py:
def load_audio(file_path):
if not os.path.exists(file_path):
raise FileNotFoundError(f"Audio file not found at: {file_path}")
# Guess the mime type (e.g., audio/mpeg for mp3, audio/wav for wav)
mime_type, _ = mimetypes.guess_type(file_path)
if mime_type is None:
mime_type = "audio/mpeg" # Fallback to mp3 just in case
with open(file_path, "rb") as audio_file:
encoded_string = base64.b64encode(audio_file.read()).decode('utf-8')
return f"data:{mime_type};base64,{encoded_string}"
Providing the audio to the model works very similarly to images.
We load them as a base 64 data URI and submit them to the model in the contents parameter, like so:
response = client.content_generation.tasks.create(
model="dreamina-seedance-2-0-260128",
content=[
{
"type": "text",
"text": prompt,
},
{
"type": "image_url",
"image_url": {
"url": utils.load_image("./images/cat.png")
},
"role": "reference_image"
},
{
"type": "audio_url",
"audio_url": {
"url": utils.load_audio("./audio/voice.mp3")
},
"role": "reference_audio"
}
],
ratio="16:9",
duration=11,
watermark=True,
resolution="480p",
)
This is the video Seedance 2 generated:
Overall, I think that it performed quite well.
Somehow, towards the end, the character started crying, which is not really what I was expecting from an angry emotion.
But note that Seedance 2 didn’t receive the audio prompt; it was only given the final audio file.
The full script can be found here.
Seedance 2.0 Project Idea: Building an AI Video Transition Tool
If you want a fun project to reinforce what you learned here, I suggest writing a Python script that takes two videos as input and:
- Extracts the last frame from the first video.
- Extracts the first frame from the second video.
- Uses the video interpolation we learned here to generate a transition between the two videos.
- Finally, the script merges the three videos into a single one.
Conclusion
In this article, we explored how to use the Seedance 2 API to generate videos from text, images, and audio.
We covered setting up the environment, connecting to the API, and managing the various media inputs required to bring ideas to life. It is a straightforward process, and the ability to combine these inputs opens up some interesting creative possibilities.
Overall, I find that the model performs reliably and produces solid results.
However, it does not feel like a breakthrough or a major leap forward when compared to other video generation models currently on the market. It gets the job done, but it does not necessarily change the game.
Finally, while I fully understand the reasoning behind the safety guidelines that restrict the use of real human faces in inputs, I found that these rules significantly limit the practical use of the model.
When I tried to use the tool to enhance my own content, it consistently rejected any video that featured my face.
This creates a hurdle for creators who want to build personalized media, and it remains a point to consider if you are looking to use this tool for professional or personal projects.
Seedance 2.0 FAQs
What is Seedance 2?
Seedance 2 is a multimodal generative AI model from ByteDance that functions as a "multimodal director," capable of processing text, images, video, and audio inputs simultaneously to create cinematic video sequences.
Is there a free tier for Seedance?
No, there is no free tier; activating and using the Seedance 2 API requires purchasing model-specific credits through the BytePlus Ark console.
Can I use reference images containing realistic human faces in my Seedance videos?
No, the API consistently rejects inputs featuring realistic human faces, even if they are AI-generated, often returning an error. Converting characters into cartoon-style images is a suggested workaround.
Is Seedance 2 multimodal?
Yes, Seedance 2 is a multimodal model, meaning it can process and integrate text, images, video, and audio inputs to create synchronized video and audio content.



