Skip to main content

Google's Veo 3: A Guide With Practical Examples

Learn how to use Veo 3 to create a spec ad, maintain character consistency across different shots, and gain modular control with the Ingredients feature.
May 22, 2025  · 12 min read

Google just launched Veo 3, its latest AI video generator. What stood out to me right away is that it offers native audio output. You can generate full video clips with sound baked in—dialogue, ambient effects, background music. That’s something I haven’t seen yet in Runway or Sora. At this point, I’d say Veo 3 is one step ahead.

Now, I’ve seen enough AI video demos to know they often oversell. They look polished, but as soon as your prompt drifts into unfamiliar territory relative to the training data—a strange setting, an unusual character, or something with too much subtlety—most models break.

But I took Veo 3 for a spin, and I’ll say this: it’s very good. Below, I’ll walk you through how it works and show you some of the clips I managed to create. I think you’ll be impressed.

We keep our readers updated on the latest in AI by sending out The Median, our free Friday newsletter that breaks down the week’s key stories. Subscribe and stay sharp in just a few minutes a week:

What Is Veo 3? 

Before we get hands-on and I show you examples, let’s quickly understand what Veo is and what is new about it.

Veo 3 is Google’s latest AI video generation model, announced at Google I/O 2025. It transforms text or image prompts into high-definition videos, now with native audio integration. This means Veo 3 can generate synchronized dialogue, ambient sounds, and background music, producing clips that feel remarkably lifelike.

Here’s an example:

At the moment, Veo 3 is only available in the U.S. and only through Flow, Google’s new AI-powered filmmaking interface. To access it, you’ll need an AI Ultra plan, which costs $250/month (about $272 with tax).

Let’s start building!

Creating an Ad

For my first test, I wanted to create a one-shot ad for a fictional mint brand called Mintro. The idea: something short, punchy, and memorable. I imagined an awkward, relatable moment—something that could work as a quick scroll-stopper.

Here’s the setup: two work colleagues stuck in a crowded elevator, face-to-face, the kind of space where confidence (and fresh breath) matters. To break the tension, one drops a line that’s equal parts tragic and hilarious:

“I once sneezed in the all-hands and clicked ‘share screen’ at the same time. No survivors.”

Then the ad would cut to the Mintro logo, along with the tagline:

“Approved for elevator talk.”

If you want to follow along, use the visual instructions in this image to create a video with Veo 3:

how to create a video with veo 3

Let’s start with this prompt and see what we get:

Prompt:

A crowded corporate elevator during morning rush hour. Two well-dressed colleagues stand face-to-face, uncomfortably close due to the packed space. One, maintaining a straight face, leans in slightly and says, “I once sneezed in the all-hands and clicked ‘share screen’ at the same time. No survivors.” The other tries to suppress a laugh. The elevator dings, and doors open to a bustling office floor.

The first version looked promising, but there were a few things that didn’t quite land.

For one, everyone in the elevator was looking at the main characters—which pulled focus in the wrong way. I wanted the surrounding people to stay in their own heads, like most of us do in the morning commute. Ideally, someone’s checking their phone, another person looks lost in thought, maybe someone adjusts their bag—but no one should be watching the interaction.

Another issue: the woman puts her hand to her nose, which subtly implies the guy’s breath smells bad. That completely undermines the point of the ad—this is supposed to be about confidence from having fresh breath. That gesture had to go.

The setting also felt off. For some reason, the elevator opened straight into an office space, which isn’t how offices are laid out. Elevators usually open into a hallway or a lobby, not directly into someone’s workstation. It’s a small detail, but it made the scene feel weirdly artificial.

On top of that, captions showed up in the video, which I didn’t ask for—and they were wildly misspelled. And finally, the soundscape inside the elevator was too dead. It needed something subtle, like ambient elevator music from overhead speakers, to make the environment feel real.

With these notes in mind, I made about five iterations until I landed on a version that felt okay. Not perfect, but much closer to what I was aiming for.

Here’s the revised prompt I used:

Prompt:

A very crowded office elevator during morning rush hour. The doors are closed at the start of the video, and as they begin to slowly open, we hear soft elevator music from the ceiling speakers and a gentle mechanical hum. The camera holds a single, continuous, eye-level shot, focused tightly on two well-dressed colleagues standing face-to-face — uncomfortably close due to the packed space. Just as the elevator doors are halfway open, the man calmly and confidently says: “I once sneezed in the all-hands and clicked ‘share screen’ at the same time. No survivors.” The woman reacts with genuine laughter — amused but never exaggerated — and she never speaks, recoils, touches her face, or steps back. Around them, the other elevator passengers remain relaxed and detached: one scrolls on their phone, another stares forward in thought, someone else shifts their bag — but no one looks at or reacts to the main characters. The doors continue to open fully, and at the end of the shot, the two colleagues step out of the elevator while the camera stays fixed in place. The characters never look into the camera. Do not include any captions, subtitles, or on-screen text.

This version got most of the blocking and tone right. Still, a few small issues remained:

  • The elevator doors opened a little too quickly, which I found too jarring.
  • The audio still felt too quiet, even with the elevator music prompt included.

In my experience with AI, it takes one minute to get 90% of the way there, and one hour to get the last 10% right—though to be honest, you almost never get it exactly how you want. So I brought the draft into DaVinci Resolve and did the rest manually. It took about 15 minutes of light editing—just some fades, background music, and the final Mintro logo with the tagline.

The logo itself was generated using Whisk, Google’s design tool that runs on Imagen 4 under the hood (you can also find it inside Gemini if you prefer working from the app). The output was clean enough that I could drop it in without needing to tweak it.

With those edits, the ad was ready. It’s short, weird, and—hopefully—memorable.

Creating a Multi-Shot Scene with Character Consistency

Now I want to show you how to build a multi-shot scene with character consistency—meaning the same character keeps their face and appearance from one shot to the next. That might sound basic, but in AI video generation, this kind of continuity is still tricky to pull off.

Just to clarify: a scene is a unit of story with continuity in time and space. It can be made up of one or several shots, depending on how you want to break it up. Once you understand that structure, it becomes easier to build full scenes—and eventually stitch them together into something like a short film.

To demonstrate, I’ll create a very quick story inspired by what’s often credited as one of the greatest pieces of flash fiction ever written (supposedly by Hemingway):

For sale: baby shoes, never worn.

That’s the emotional core I want to build around. I imagined a two-shot micro-narrative to bring this line to life:

  • Shot 1: A woman in her late 30s opens a hallway closet filled with old coats, folded linens, and a few unlabeled cardboard boxes. She pulls one of the boxes down gently and kneels on the floor. She opens the box and carefully unwraps a small item inside: a pair of pristine white baby shoes, nestled in tissue paper.
  • Shot 2: A few minutes later, in the kitchen. The woman sits alone at the kitchen table, phone in hand. The camera holds a still, medium-close side angle. She places the baby shoes on the table beside her and begins typing a listing on her phone. Text on the phone screen: “For sale: baby shoes, never worn.”

This time, I’m not going to iterate for a polished, cinematic result. My goal is simply to show what’s possible with this tool—how to establish tone and maintain character appearance across multiple shots.

Let’s start by generating the first shot normally (like we generated the shots for the ad).

Prompt:

Interior of a quiet, lived-in home during early morning. Natural light filters softly through a hallway window. A woman in her late 30s opens a hallway closet filled with old coats, folded linens, and a few unlabeled cardboard boxes. She pulls one of the boxes down gently and kneels to the floor. The camera remains still at a medium-wide angle, eye-level. She opens the box and carefully unwraps a small item inside: a pair of pristine white baby shoes, nestled in tissue paper. She sits back on her heels, holding the shoes in her lap. Her expression is unreadable — not sad, just present and still. The shot is quiet and unhurried. No music. Emphasize natural ambience — soft house sounds, the creak of the closet door, cardboard rustling, and subtle distant details like a ticking clock or a bird outside the window. The moment should feel hushed and real. Visual style: warm, grounded realism with natural lighting. Avoid cinematic over-stylization. Maintain a single, continuous shot without cuts or zooms. Do not include any on-screen text or captions.

Not bad at all. I like the framing, the color, and the sound is alright. The acting isn’t great—there’s not much emotion—but let’s get over this.

Let’s say we now want to move to the next shot in the kitchen. Our best chance at maintaining character consistency—keeping the same face, outfit, and general appearance—is to use the Scene Builder.

Once you’re satisfied with your first shot, click Add to scene:

A timeline will open up. Click the plus sign, and then choose between:

  • Jump to: This happens and then (the scene jumps to)
  • Extend: This happens and then (extended shot)

For this example, I need a cut, so I will choose Jump to and then use this prompt (I got this after a few iterations—this feature definitely needs improvements):

Prompt:

In the kitchen a few minutes later. Sunlight filters gently across the table and floor, creating a calm, quiet atmosphere. Quiet household ambience — the soft hum of the refrigerator, a faint creak of the chair, gentle taps on the phone screen. No music or external voices. The woman sits alone at the kitchen table, phone in hand. The camera holds a still, medium-close side angle. She places the baby shoes on the table beside her and begins typing a listing on her phone. The camera cuts to an over-the-shoulder shot or tight insert showing the phone screen: “For sale: baby shoes, never worn.” She stares at the text for a long moment, thumb hovering over the post button. Her eyes begin to glisten, but she quickly blinks it back. She doesn’t cry — instead, she locks the phone, sets it face-down, and exhales, steadying herself. Her expression is restrained and unreadable, but her body language says everything: this is not easy. Do not include any on-screen subtitles.

Prompt adherence was low—the tone and composition didn’t match what I had in mind. That said, the character consistency was decent: same haircut, similar facial structure, but the clothes changed.

I also noticed some visual artifacts in the output (check the shoes). And even though I expected a single shot, I got three separate cuts in one generation. Later, I realized I had unintentionally suggested a second cut in the prompt, so that part’s on me—but I still have no idea where the third shot came from.

On top of that, exporting from Scene Builder removed the audio entirely. I’m not sure if that’s a bug or just a limitation of the current setup, but there doesn’t seem to be a straightforward fix. You can download each shot individually, though—so I just stitched them back together in DaVinci Resolve.

There’s still a lot of work for Google to do on the Scene Builder feature, but this is promising!

Modular Control With Ingredients to Video

One of the more experimental (and fun) features inside Flow is Ingredients to Video. It gives you modular control: you generate individual elements—called ingredients—and then combine them into a scene.

You can create ingredients using image generation, although image upload isn’t supported yet. Here’s an example from the Google team:

For this test, I wanted to try something a little absurd—a funny, Kafkaesque short:

A bug with a human face drives an SUV. But here’s the twist (as if it wasn’t already enough): the driver’s seat is a king’s chair.

First, let’s select the Ingredients to Video option:

ingredients to video feature in veo 3

I started by generating the three ingredients one by one: the chair, the SUV, and the bug.

generate ingredients in veo 3

Unfortunately, this feature currently runs on Veo 2, not Veo 3. Technically, you can select Veo 3 from the dropdown, but it always auto-switches back to Veo 2 during generation and shows this warning:

veo 3 warning

As expected, the output quality was underwhelming:

Prompt:

A bug with a human face calmly drives an SUV, seated on an oversized king’s throne.

That said, two of the three ingredients—especially the bug and the chair—looked surprisingly good. The SUV, not so much…

With Veo 3’s capabilities, this setup would likely have been much stronger. For now, this mode shows promise, but it’s not quite there yet.

Frames to Video

The idea behind Frames to Video is this: you provide the model with a first and last frame, and it tries to animate a transition between them (through a camera movement you can control). You can either generate these frames from a prompt or (eventually) upload them yourself—image upload isn’t available yet.

frames to video in veo 3

Like the Ingredients feature, this mode automatically defaults to Veo 2, which limits the quality significantly. I wasn’t able to generate anything particularly useful with it.

In the end, I used it to animate a single shot of a chameleon. I set the same image as both the starting and ending frame and asked for a dolly-in camera movement—but that part wasn’t respected in the final render.

Prompt:

A chameleon sits motionless on a branch, eyes slowly scanning in opposite directions as it waits patiently for prey.

Veo 3 Best Practices

When you first get access to Veo 3 through Flow, you’ll start with 12,500 credits. Each video generation consumes a chunk of that total—150 credits per generation with Veo 3—so it’s worth being strategic from the start.

My advice: think carefully about your prompts and generate only one output at a time. You’ll need to spread those credits out across the month, and each generation takes time—often 2 to 3 minutes or more. That makes iteration relatively slow, so trial-and-error isn’t cheap or fast.

For prompt crafting, Google provides a Vertex AI video generation prompt guide that offers insights into structuring effective prompts for Veo. This guide emphasizes the importance of clear, descriptive prompts and provides examples to help you get started.

If you’re looking for additional guidance, the Runway Gen-3 Alpha Prompting Guide is a valuable resource. It offers detailed strategies for crafting prompts that yield high-quality video outputs, which can also be beneficial when working with Veo 3.

Conclusion

I haven’t been this amazed by an AI breakthrough since GPT-4o’s image generation.

Veo 3 delivers something that feels fundamentally new: coherent, sound-enabled video from natural language prompts. That alone sets it apart from everything else I’ve tested.

Sure, it has its flaws—prompt drift, lack of full Veo 3 access in key tools like Scene Builder, and occasional visual glitches—but the core experience is genuinely exciting.

What stands out is how close it already feels to a usable creative pipeline. With a bit of editing and some careful prompting, you can go from idea to storyboard to a working short project in under a few hours. Add in character consistency (even if it’s a bit fragile), audio baked into the output, and support for modular workflows, and this starts to look like a serious tool.


Alex Olteanu's photo
Author
Alex Olteanu
LinkedIn

I’m an editor and writer covering AI blogs, tutorials, and news, ensuring everything fits a strong content strategy and SEO best practices. I’ve written data science courses on Python, statistics, probability, and data visualization. I’ve also published an award-winning novel and spend my free time on screenwriting and film directing.

Topics

Learn AI with these courses!

Track

AI Fundamentals

0 min
Discover the fundamentals of AI, dive into models like ChatGPT, and decode generative AI secrets to navigate the dynamic AI landscape.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

blog

What Is Google's Veo 2? How to Access It, Features, Examples

Learn about Google's new AI video generation tool, Veo 2, including its features, how to use it, and how it compares to OpenAI's Sora.
Dr Ana Rojo-Echeburúa's photo

Dr Ana Rojo-Echeburúa

8 min

blog

Google I/O 2025: Top 8 AI Announcements (And My Take)

Learn about the most important AI announcements at Google I/O 2025—Veo 3, Flow, Imagen 4, and more—plus first impressions.
Alex Olteanu's photo

Alex Olteanu

8 min

blog

Top 7 AI Video Generators for 2025 With Example Videos

Discover the top AI video generators available today, including RunwayML, Synthesia, Colossyan, Pictory, DeepBrain AI, Invideo, and the highly anticipated Sora and DeepMind’s Veo.
Dr Ana Rojo-Echeburúa's photo

Dr Ana Rojo-Echeburúa

9 min

Tutorial

v0 by Vercel: A Guide With Demo Project

Learn how to use Vercel's v0 to generate UI components and web applications from natural language prompts, and how to integrate and deploy them.
Marie Fayard's photo

Marie Fayard

12 min

Tutorial

How to Use Sora AI: A Guide With 10 Practical Examples

Learn how to use Sora AI to create and edit videos, including using remix, loop, re-cut, style presets, and storyboards.
François Aubry's photo

François Aubry

12 min

Tutorial

Imagen 3: A Guide With Examples in the Gemini API

Learn how to generate images using Google’s Imagen 3 API with Python, including setting up your environment and adjusting options like aspect ratio and safety filters.
François Aubry's photo

François Aubry

12 min

See MoreSee More