Accéder au contenu principal

Seedance 2.0 API Guide: Step-by-Step Video Generation Tutorial

Learn how to use the Seedance 2 API with this step-by-step Python tutorial. Master text-to-video, image interpolation, and multimodal inputs via BytePlus Ark.
28 juin 2026

In this tutorial, I will demonstrate how to use the BytePlus Ark API to interact with Seedance 2 using Python. 

Seedance 2 is a powerful generative AI model from ByteDance that functions as a "multimodal director," capable of processing text, images, video, and audio inputs simultaneously to create cinematic video sequences with synchronized audio. Whether you are looking to animate static images or bring complex text prompts to life, this guide will walk you through the setup and implementation. 

Generating a Seedance 2 API Key

To get started, we need to create a BytePlus account and an API key. To do so:

  1. Go to BytePlus’s official website and create an account.
  2. Go to the API key creation page and click “Create API Key”.

This key is used to make requests to the ModelArk API, their official API. 

The best practice is to store the key in a file named .env in the same folder where we write our Python scripts. Make sure to keep the key secret because anyone can use it to interact with the API using your account.

Paste the API key into the .env file with the following format:

ARK_API_KEY=<paste_api_key_here>

Activating the Seedance 2 model and pricing structure

To be able to use Seedance 2, we first need to activate the model: 

Interface screenshot showing how to activate the Seedance 2.0 model.

Note that using the API isn't free and activating Seedance 2 requires purchasing credits for the model. For this article, I purchased a 7 million token pack for Seedance 2 for $30.10. 

Screenshot of the Seedance 2 token purchase interface.

Unfortunately, the tokens are model-specific. This means that if we buy tokens for the fast or mini versions of Seedance 2, we can’t use them with the base model.

Below is a table with the pricing plans for the API:

How to Generate Videos With The Seedance 2 API In Python

I’ve included the complete scripts corresponding to the code created in this tutorial in this GitHub repository.

In this section, we learn how to make a text-to-video request to generate this video:

1. Setting up your Python environment and SDK

To connect with the BytePlus Ark API using Python, we'll use the official byteplus-python-sdk-v2 package. 

The easier way to set up the dependencies for this project is to create an Anaconda environment and install the requirements.txt file, like so:

git clone git@github.com:fran-aubry/seedance2-tutorial.git
conda create --name seedance2 python=3.11
conda activate seedance2
pip install -r requirements.txt

2. Initializing the BytePlus Ark Python client

Create a new script called text_to_video.py in the same folder as the .env file we created before.

First, we need to import the necessary packages:

import os
from byteplussdkarkruntime import Ark
from dotenv import load_dotenv

Next, we load the API key from .env file using the load_dotenv() function:

load_dotenv()
API_KEY = os.getenv("ARK_API_KEY")

Now we initialize the Ark client, which allows us to make requests to the BytePlus Ark API. 

We use the os library to get the API key from the environment variables and set the appropriate base URL for international API access:

client = Ark(
    api_key=os.getenv("ARK_API_KEY"),
    base_url="https://ark.ap-southeast.bytepluses.com/api/v3"
)

Finally, we set a prompt and use the content_generation.tasks.create() function from the client to generate the video:

prompt = """
A cinematic close-up of a weary but hopeful female astronaut inside a dimly lit, dusty spaceship cabin. 
Soft blue light illuminates her face. 
She takes a deep breath, showing subtle emotional relief, looks directly into the camera, and says: 
'We finally made it. The atmosphere is stable.' 
The ambient sound of a low, rhythmic spaceship engine hums in the background.
"""
response = client.content_generation.tasks.create(
    model="dreamina-seedance-2-0-260128",
    content=[
        {
            "type": "text",
            "text": prompt,
        }
    ],
    generate_audio=True,
    ratio="16:9",
    duration=8,
    watermark=True,
    resolution="480p",
)
task_id = response.id
print(f"Task successfully submitted! Task ID: {task_id}")

After requesting to generate a video, we don't get a video immediately because the video takes some time to generate. 

This means that the response variable isn't the video itself. 

Instead,  it’s an object that contains information about the video generation task. 

In particular, it contains the task identifier, which lets us track the progress of the video generation and download the video once the model is done generating it.

3. Tracking video generation status (Task polling)

We can retrieve the status of a task by providing the task identifier to the content_generation.tasks.get() function, like so:

task = client.content_generation.tasks.get(task_id=task_id)
status = task.status
print(f"Status: {status}")

To make it easier to track video progress, we create a new script utils.py in the same folder and add this function that tracks the video progress given the client instance and task identifier:

import time
import requests

def poll_task(client, task_id, poll_interval=5):
    while True:
        try:
            task_status = client.content_generation.tasks.get(task_id=task_id)
            status = getattr(task_status, "status", None) or (task_status.get("status") if isinstance(task_status, dict) else None)
            
            if status == "succeeded":
                print("\nTask completed successfully.")
                return task_status.content.video_url
                
            elif status == "failed":
                error_details = getattr(task_status, "error", "Unknown error occurred during processing.")
                raise RuntimeError(f"Task failed: {error_details}")
                
            print(".", end="", flush=True)
            time.sleep(poll_interval)
        except Exception as e:
            if isinstance(e, RuntimeError):
                raise
            print(f"\nWarning: Error polling task (will retry): {e}")
            time.sleep(poll_interval)

4. Downloading the generated video with Python requests

The poll_task() function we implemented above returns the video 

Now that we can track the video generation process, all we need is a way to download it when it finishes generating. 

Since the BytePlus Ark API returns a direct URL to the generated video file within the task response, we can download it using the requests library.

Here's a function that does this:

def download_video(url, output_path = "video.mp4"):
    try:
        print(f"Downloading video from {url} to {output_path}...")
        response = requests.get(url, stream=True)
        response.raise_for_status()
        
        with open(output_path, 'wb') as f:
            for chunk in response.iter_content(chunk_size=8192):
                f.write(chunk)
        print(f"Video successfully downloaded to '{output_path}'")
        return True
    except Exception as e:
        print(f"Error downloading video: {e}")
        return False

Complete Python script: Text-to-video with Seedance 2

Putting all of these together, here’s a full script for generating a video from a text prompt using Seedance 2:

import os
from byteplussdkarkruntime import Ark
from dotenv import load_dotenv
import utils

load_dotenv()
API_KEY = os.getenv("ARK_API_KEY")

client = Ark(
    api_key=os.getenv("ARK_API_KEY"),
    base_url="https://ark.ap-southeast.bytepluses.com/api/v3"
)

prompt = """
A cinematic close-up of a weary but hopeful female astronaut inside a dimly lit, dusty spaceship cabin. 
Soft blue light illuminates her face. 
She takes a deep breath, showing subtle emotional relief, looks directly into the camera, and says: 
'We finally made it. The atmosphere is stable.' 
The ambient sound of a low, rhythmic spaceship engine hums in the background.
"""

response = client.content_generation.tasks.create(
    model="dreamina-seedance-2-0-260128",
    content=[
        {
            "type": "text",
            "text": prompt,
        }
    ],
    generate_audio=True,
    ratio="16:9",
    duration=8,
    watermark=True,
    resolution="480p",
)

task_id = response.id
print(f"Task successfully submitted! Task ID: {task_id}")

# Wait for the video to finish generating
video_url = utils.poll_task(client, task_id)

utils.download_video(video_url, "./videos/space.mp4")

Seedance 2 API core parameters reference guide

Below is a list of accepted values for the core parameters of a video generation request:

A table displaying the core parameters of Seedance 2 their available values. The 'duration' parameter accepts values from 4s to 15s. The 'aspect_ratio' parameter accepts 16:9, 4:3, 1:1, 3:4, 9:16, and 21:9. The 'resolution' parameter accepts 480p, 720p, 1080p, and 4k

Advanced Guide: Seedance 2.0 Image-to-Video Capabilities

In the previous section, we learned how to generate a video using a text prompt. Seedance 2 also supports image, video, and audio inputs, so let’s start with images.

Using reference images for character consistency

To test image inputs, I generated two characters using AI to use in a sitcom-like scene.

image9.png

We can provide image references to the model by adding them to the content field of the request, like so:

response = client.content_generation.tasks.create(
    model="dreamina-seedance-2-0-260128",
    content=[
        {
            "type": "text",
            "text": prompt,
        },
        {
            "type": "image_url",
            "image_url": {
                "url": utils.load_image("./images/male-character.png")
            },
            "role": "reference_image"
        },
        {
            "type": "image_url",
            "image_url": {
                "url": utils.load_image("./images/female-character.png")
            },
            "role": "reference_image"
        }
    ],
    generate_audio=True,
    ratio="16:9",
    duration=15,
    watermark=True,
    resolution="480p",
)

For this video, I used the following prompt:

Context: A cozy, warmly lit independent coffee shop counter with wooden accents.
Reference: 
- The male barista's face and clothing follow @Image1.
- The female customer's face and styling follow @Image2.

Action & Dialogue (10 Seconds Total):

0:00 - 0:04: 
Medium close-up on the male barista (@Image1) standing behind the counter,
next to the coffee machine. He leans across the counter with a smooth, 
flirtatious smirk, acting like he is making a highly exclusive offer: 
"The WiFi password? It's actually my phone number: five-five-five, zero-one—"

0:04 - 0:07: 
Camera cuts to the female customer (@Image2), instantly interrupting him. 
She holds up her phone screen toward him, entirely unimpressed: 
"Your network is called 'Free_Guest_WiFi' and there is literally no password."

0:07 - 0:08: Quick cut back to the two-shot. 
Flustered, the barista becomes visibly embarrassed, his confident smirk 
instantly vanishing as he nervously breaks eye contact.

0:08 - 0:10: Camera cuts back to a medium shot of the female customer (@Image2). 
She lowers her phone, tapping her fingers lightly on the counter, 
and delivers her order with perfect deadpan clarity: 
"It will be an espresso please. No milk, no sugar, and no phone number."

Framing/Timing: 
Clean cuts to perfectly pace the comedic timing over exactly 10 seconds.

Tone/Audio: 
Lighthearted sitcom vibe. Synchronized voice generation for the dialogue, a subtle low cafe background hum, and the sound of the customer's fingers lightly tapping the wooden counter at the end.

Note that the prompt uses @Image1 and @Image2 to refer to the images in the order they were provided in the list.

Unfortunately, when I tried to generate this video, I got an error saying: 

The request failed because the input image may contain a real person.

I noticed this consistently as I tried to use Seedance 2 with a reference image containing a realistic human, even with AI-generated ones. 

To overcome this, I converted the character into cartoons and tried again.

This time it worked perfectly, and I think the result was exactly what I wanted to create.

The full code for this example can be found in the image_to_video.py file.

Setting first and last frames for guided animation

The previous example showed how to use images as references by setting ”role”: “reference_image” in the request.

Seedance 2 supports two other image roles:

  • ”first_frame”: Instructs the model to use the provided image as the exact starting frame of the generated video. This is the standard setting for direct Image-to-Video generation.
  • ”last_frame”: Instructs the model to end the generated video on this specific image. Note: You can only use this if you are also providing a first_frame in the same request, effectively asking the model to interpolate the video between the two images.

I tried this by generating a succession of initial frames showing a robot AI lab trying to recreate plants in a world where natural life has disappeared. 

image11.png

For each frame, I generated a video using that frame as the first frame. Here is how to structure each request:

response = client.content_generation.tasks.create(
    model="dreamina-seedance-2-0-260128",
    content=[
        {
            "type": "text",
            "text": prompt,
        },
        {
            "type": "image_url",
            "image_url": {
                "url": utils.load_image("./images/robot-lab1.jpeg")
            },
            "role": "first_frame"
        }
    ],
    generate_audio=False,
    ratio="16:9",
    duration=4,
    watermark=True,
    resolution="480p",
)

I generated the videos without audio because the audio wouldn’t be consistent, since each video was generated separately. Here’s the result of putting the five clips together:

The full script used to generate a video based on a starting frame can be found here.

Image interpolation with last_frame

In this example, we provide both a first_frame and last_frame to generate a video between the two.

I generated two images of the same room, one empty without furniture and the other with the room fully furnished. The goal is to generate a video that animates the room being furnished.

image5.png

The prompt I used described how I want the furniture to appear in the video:

A dynamic, fast-paced sequence where furniture pops into existence piece by piece. First, a rug rapidly unfolds itself, then the sofa drops from just above the floor with a heavy, realistic bounce. Lamps, artwork, and plants quickly spring into reality one after another. Vibrant lighting, photorealistic materials, lively and snappy energetic rhythm.

Here’s the request:

response = client.content_generation.tasks.create(
    model="dreamina-seedance-2-0-260128",
    content=[
        {
            "type": "text",
            "text": prompt,
        },
        {
            "type": "image_url",
            "image_url": {
                "url": utils.load_image("./images/room-start.png")
            },
            "role": "first_frame"
        },
        {
            "type": "image_url",
            "image_url": {
                "url": utils.load_image("./images/room-end.png")
            },
            "role": "last_frame"
        }
    ],
    generate_audio=True,
    ratio="16:9",
    duration=8,
    watermark=True,
    resolution="480p",
)

And here’s the result:

The full script can be found here.

Seedance 2.0 Multimodal Inputs: Adding Audio and Video References

As a multimodal model, Seedance 2 goes beyond text and image inputs. It also supports video and audio inputs. These can be used as stylistic references, movement driving, or video editing.

This example from their official website illustrates this very well. They took a scene of two actors fighting and used that to drive the action, replacing the actors with reference images:

Submitting a video input

As with images, video inputs are provided in the content parameter of the request. 

However, as of this writing, video inputs only support URLs. This means that if we want to use a video we have locally on our machine, we’ll need to host it first.

For this example, I used this free-to-use skating video by Marc Espejo. The idea was to use Seedance 2 to add some VFX. Here’s the result:

To generate this, I used the following request:

response = client.content_generation.tasks.create(
    model="dreamina-seedance-2-0-260128",
    content=[
        {
            "type": "text",
            "text": prompt,
        },
        {
            "type": "video_url", 
            "video_url": {
                "url": "https://www.pexels.com/download/video/38213009/"
            },
            "role": "reference_video" 
        }
    ],
    generate_audio=True,
    ratio="16:9",
    duration=6,
    watermark=True,
    resolution="480p",
)

The full script can be found here. The prompt I used was:

The original skater, environment, and camera movement must remain 
unchanged and untouched. 

Only add gaming-style visual effects: intense glowing sparks and electric arcs 
streaming from the skateboard-rail contact point during the grind, and a 
brilliant energy shockwaves and dust burst at the jump/pop. 

Submitting an audio input

To test audio inputs, I generated an image for a character and used Fish Audio to generate an audio track with different emotions.

image8.png

For the audio, I wanted a few different emotions to see if Seedance 2 was able to identify those when generating the video. 

Fish Audio AI let us specify emotions using [emotion] in the prompt, so it was perfect for the job. Here’s the audio prompt I used:

[sighing] Nothing much is happening today. [surprised] Wait a minute... what are those on my ears?[laughing] Oh my gosh, I can't believe I got piercings last night! [angry] But hold on, I specifically told you not to let me do it!

To load the audio file, I wrote this function that I added to utils.py:

def load_audio(file_path):
    if not os.path.exists(file_path):
        raise FileNotFoundError(f"Audio file not found at: {file_path}")

    # Guess the mime type (e.g., audio/mpeg for mp3, audio/wav for wav)
    mime_type, _ = mimetypes.guess_type(file_path)
    if mime_type is None:
        mime_type = "audio/mpeg" # Fallback to mp3 just in case

    with open(file_path, "rb") as audio_file:
        encoded_string = base64.b64encode(audio_file.read()).decode('utf-8')
        
    return f"data:{mime_type};base64,{encoded_string}"

Providing the audio to the model works very similarly to images. 

We load them as a base 64 data URI and submit them to the model in the contents parameter, like so:

response = client.content_generation.tasks.create(
    model="dreamina-seedance-2-0-260128",
    content=[
        {
            "type": "text",
            "text": prompt,
        },
        {
            "type": "image_url",
            "image_url": {
                "url": utils.load_image("./images/cat.png")
            },
            "role": "reference_image"
        },
        {
            "type": "audio_url",
            "audio_url": {
                "url": utils.load_audio("./audio/voice.mp3")
            },
            "role": "reference_audio"
        }
    ],
    ratio="16:9",
    duration=11,
    watermark=True,
    resolution="480p",
)

This is the video Seedance 2 generated:

Overall, I think that it performed quite well. 

Somehow, towards the end, the character started crying, which is not really what I was expecting from an angry emotion. 

But note that Seedance 2 didn’t receive the audio prompt; it was only given the final audio file.

The full script can be found here.

Seedance 2.0 Project Idea: Building an AI Video Transition Tool

If you want a fun project to reinforce what you learned here, I suggest writing a Python script that takes two videos as input and:

  1. Extracts the last frame from the first video.
  2. Extracts the first frame from the second video.
  3. Uses the video interpolation we learned here to generate a transition between the two videos. 
  4. Finally, the script merges the three videos into a single one.

Conclusion

In this article, we explored how to use the Seedance 2 API to generate videos from text, images, and audio. 

We covered setting up the environment, connecting to the API, and managing the various media inputs required to bring ideas to life. It is a straightforward process, and the ability to combine these inputs opens up some interesting creative possibilities.

Overall, I find that the model performs reliably and produces solid results. 

However, it does not feel like a breakthrough or a major leap forward when compared to other video generation models currently on the market. It gets the job done, but it does not necessarily change the game.

Finally, while I fully understand the reasoning behind the safety guidelines that restrict the use of real human faces in inputs, I found that these rules significantly limit the practical use of the model. 

When I tried to use the tool to enhance my own content, it consistently rejected any video that featured my face. 

This creates a hurdle for creators who want to build personalized media, and it remains a point to consider if you are looking to use this tool for professional or personal projects.

Seedance 2.0 FAQs

What is Seedance 2?

Seedance 2 is a multimodal generative AI model from ByteDance that functions as a "multimodal director," capable of processing text, images, video, and audio inputs simultaneously to create cinematic video sequences.

Is there a free tier for Seedance?

No, there is no free tier; activating and using the Seedance 2 API requires purchasing model-specific credits through the BytePlus Ark console.

Can I use reference images containing realistic human faces in my Seedance videos?

No, the API consistently rejects inputs featuring realistic human faces, even if they are AI-generated, often returning an error. Converting characters into cartoon-style images is a suggested workaround.

Is Seedance 2 multimodal?

Yes, Seedance 2 is a multimodal model, meaning it can process and integrate text, images, video, and audio inputs to create synchronized video and audio content.


François Aubry's photo
Author
François Aubry
LinkedIn
Full-stack engineer & founder at CheapGPT. Teaching has always been my passion. From my early days as a student, I eagerly sought out opportunities to tutor and assist other students. This passion led me to pursue a PhD, where I also served as a teaching assistant to support my academic endeavors. During those years, I found immense fulfillment in the traditional classroom setting, fostering connections and facilitating learning. However, with the advent of online learning platforms, I recognized the transformative potential of digital education. In fact, I was actively involved in the development of one such platform at our university. I am deeply committed to integrating traditional teaching principles with innovative digital methodologies. My passion is to create courses that are not only engaging and informative but also accessible to learners in this digital age.
Sujets

Top DataCamp Courses

Cursus

Ingénieur IA associé pour les scientifiques de données

40 h
Entraînez et affinez les derniers modèles d'IA pour la production, y compris les LLM comme le Llama 3. Commencez dès aujourd'hui votre parcours pour devenir ingénieur en IA !
Afficher les détailsRight Arrow
Commencer le cours
Voir plusRight Arrow
Contenus associés

blog

What Is Seedance 2.0? A Guide With Examples

Discover the quad-modal input and reference features of the Seedance 2.0 video generation model. See how it works, access details, and how it compares to Sora 2.
Tom Farnschläder's photo

Tom Farnschläder

15 min

Tutoriel

Seedream 4.5: A Complete Guide With Python

A hands-on Python guide to ByteDance's Seedream 4.5 image model covering batch generation, text rendering, multi-image editing, and prompting best practices.
François Aubry's photo

François Aubry

Tutoriel

Sora 2 API With Python: A Complete Guide With Examples

Learn how to bring your video ideas to life using the Sora 2 API with this complete guide on how to use Python to interact with the OpenAI API.
François Aubry's photo

François Aubry

Tutoriel

Grok Imagine API: A Complete Python Guide With Examples

Learn how to generate videos using the Grok Imagine API. This Python guide covers everything from image animations to video editing with the new xAI video model.
François Aubry's photo

François Aubry

Tutoriel

Nano Banana 2: A Full Guide With Python

Learn everything you need to know about Google’s latest image generation model, Nano Banana 2, including how to build an iterative chat image editor using the API with Python.
François Aubry's photo

François Aubry

Tutoriel

GPT-Image-1 API: A Step-by-Step Guide With Examples

Learn how to generate and edit images with the GPT-Image-1 API, including setup, parameter usage, and practical examples like masking and combining multiple images.
François Aubry's photo

François Aubry

Voir plusVoir plus