course
What Is OpenAI's Sora? How It Works, Examples, Features
Day 3 of the “12 Days of OpenAI” came with an exciting announcement: Sora AI is now available.
This text-to-video generative AI model looks incredibly impressive so far, introducing some huge potential across many industries. Here, we explore what OpenAI’s Sora is, how it works, some potential use cases, and what the future holds.
OpenAI Fundamentals
Get Started Using the OpenAI API and More!
What Is Sora?
Sora is OpenAI's text-to-video generative AI model. That means you write a text prompt, and it creates a video that matches the description of the prompt. Here's an example from the OpenAI site:
PROMPT: A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about.
Sora Features
Sora has a few features that allow us to take more control of the video generation process. Let’s explore each.
Remix
The remix feature allows users to reimagine existing videos by altering their components without losing the essence of the original. Whether it’s changing colors, substituting backgrounds, or tweaking visual elements, remix provides a flexible way to experiment with a video’s appearance.
This feature is perfect for creators looking to refresh old content, tailor videos to specific themes, or explore variations for branding purposes.
For instance, consider the two videos below:
- Original video: "Open large doors into a library"
- Remix video: “Turn the library into a spaceship”
Re-cut
The re-cut feature allows creators to pinpoint and isolate the most impactful frames in a video, extending them in either direction to build out a complete scene. This tool is perfect for enhancing key moments, drawing attention to specific visuals, or ensuring a smoother flow between scenes. By focusing on the strongest frames, Re-cut helps refine the storytelling process while giving creators greater control over pacing and emphasis.
Loop
The loop feature makes it easy to create seamless repetitions of video clips. Ideal for background visuals, music videos, or hypnotic animations, this tool ensures transitions between loops are smooth and natural. It allows creators to extend the duration of captivating moments or maintain a consistent rhythm for videos designed to play continuously.
Below, we see a flower that continuously blossoms and closes in a seamless loop, with no visible cuts in the transition:
Storyboard
The storyboard feature enables creators to generate specific shots at designated frame points along the timeline, offering precise control over the visual narrative. For example, using OpenAI’s demo, you can storyboard the following sequence of shots:
- Frames 0-114: “A vast red landscape with a docked spaceship in the distance.”
- Frames 114-324: “Looking out from inside the spaceship, a space cowboy stands center frame.”
- Frames 324-440: “Detailed close-up view of an astronaut’s eyes framed by a knitted fabric mask.”
Blend
The blend feature allows you to combine different video or style elements to create new compositions. By mixing footage, colors, or artistic approaches, it supports crafting visuals that feel distinct and fresh. This approach works well for experimental projects, mashups, or creative storytelling that explores unconventional ideas.
Below, we see a video that blends two videos:
- A video of snowflakes falling
- A video of flower petals falling
Style presets
Style preset provides a collection of predefined aesthetic templates that can be applied to videos. These presets make it easier to achieve a specific look, whether cinematic, vibrant and playful, or professional.
For instance, this is how the Film Noir preset looks like:
How Does Sora Work?
Like text-to-image generative AI models such as DALL·E 3, StableDiffusion, and Midjourney, Sora is a diffusion model. That means that it starts with each frame of the video consisting of static noise, and uses machine learning to gradually transform the images into something resembling the description in the prompt.
Solving temporal consistency
One area of innovation in Sora is that it considers several video frames at once, which solves the problem of keeping objects consistent when they move in and out of view. In the following video, notice that the kangaroo's hand moves out of the shot several times, and when it returns, the hand looks the same as before.
PROMPT: A cartoon kangaroo disco dances.
Combining diffusion and transformer models
Sora combines the use of a diffusion model with a transformer architecture, as used by GPT.
When combining these two model types, Jack Qiao noted that "diffusion models are great at generating low-level texture but poor at global composition, while transformers have the opposite problem." That is, you want a GPT-like transformer model to determine the high-level layout of the video frames and a diffusion model to create the details.
In a technical article on the implementation of Sora, OpenAI provides a high-level description of how this combination works. In diffusion models, images are broken down into smaller rectangular "patches." For video, these patches are three-dimensional because they persist through time. Patches can be thought of as the equivalent of "tokens" in large language models: rather than being a component of a sentence, they are a component of a set of images. The transformer part of the model organizes the patches, and the diffusion part of the model generates the content for each patch.
Another quirk of this hybrid architecture is that to make video generation computationally feasible, the process of creating patches uses a dimensionality reduction step so that computation does not need to happen on every single pixel for every single frame.
Increasing Fidelity of Video with Recaptioning
To faithfully capture the essence of the user's prompt, Sora uses a recaptioning technique that is also available in DALL·E 3. This means that before any video is created, GPT is used to rewrite the user prompt to include a lot more detail. Essentially, it's a form of automatic prompt engineering.
How Good Is OpenAI Sora?
As you can see from the examples provided so far, Sora seems to be an impressive tool and we’re only scratching the surface of what’s possible. For example, check out the clip below, which offers a sample of what is possible when working with filmmakers and artists:
This short film feels like a genuine movie trailer, with a range of different shots, angles, and concepts on display, creating a fairly seamless video.
However, other examples shown by OpenAI team members are slightly less convincing (albeit still impressive). Check out the video below of the couple on a beach (this video was generated using the previous Sora model, and we plan to re-generate it once we gain access to the updated Sora version):
PROMPT: Realistic video of people relaxing at beach, then a shark jumps out of the water halfway through and surprises everyone.
While clearly, it hits the main beats of the prompt, it’s not a particularly convincing scene, and it falls firmly in the uncanny valley. The man’s three hands, the shark that comes together in multiple parts at an unconvincing scale, the Exorcist-esque head swivel and shout from the woman - it’s all a bit terrifying.
It’s likely that, as with generative images, there will be a degree of refining prompts and making allowances - it’s not going to create something perfect every time.
That being said, let’s compare the above video to an example created using the exact same prompt using Runway’s Gen-2 model:
As you can see, it’s not particularly grasped the context of the prompt and has a weird placement of the shark and some fairly disfigured and amorphous people. Comparatively, OpenAI’s Sora has done a much better job of creating the scene compared to Runway Gen-2.
Another impressive example of a Sora use case was seen recently with a director who made a music video with Sora:
This is arguably one of the most fully realised examples of Sora in action and it shows the huge potential for this as a tool for the future. It’s interesting (and a little trippy) and captures a pretty distinct vibe that’s consistent throughout.
However, there are some caveats to this creation:
- The director generated 6 hours of clips for a 4 minute video (using 46 hours of rendering time on an H100 GPU)
- The example prompt is around 1,400 words, which is pretty detailed and specific
- The director still had to use after effects and clean up some of the transitions (which still feel unnatural in places)
So it certainly feels like we’re a way of consumer use for this tool, but given the short window that Sora has been available for artists and creatives to trial, the progress is fairly startling.
What Are the Limitations of Sora?
This section outlines a few limitations of the previous version of Sora. It’s worth checking if the new version addresses these issues. We will update this section as soon as we gain access to the new version.
Sora does not have an implicit understanding of physics, and so "real-world" physical rules may not always be adhered to. One example of this is that the model does not understand cause and effect. For example, in the following video of an explosion on a basketball hoop, after the hoop explodes, the net appears to be restored.
PROMPT: Basketball through hoop then explodes.
Similarly, the spatial position of objects may shift unnaturally. In the following video of wolf pups, animals appear spontaneously, and the position of the wolves sometimes overlaps.
PROMPT: Five gray wolf pups frolicking and chasing each other around a remote gravel road, surrounded by grass. The pups run and leap, chasing each other, and nipping at each other, playing.
Sora's Use Cases
Sora can be used to create videos from scratch or extend existing videos to make them longer. It can also fill in missing frames from videos.
In the same way that text-to-image generative AI tools have made it dramatically easier to create images without technical image editing expertise, Sora promises to make it easier to create videos without image editing experience. Here are some key use cases.
Social media
Sora can be used to create short-form videos for social media platforms like TikTok, Instagram Reels, and YouTube Shorts. Content that is difficult or impossible to film is especially suitable. For example, this scene of Lagos in 2056 would be technically difficult to film for a social post but is easy to create using Sora.
PROMPT: A beautiful homemade video showing the people of Lagos, Nigeria in the year 2056. Shot with a mobile phone camera.
Advertising and marketing
Creating adverts, promotional videos, and product demos is traditionally expensive. Text-to-video AI tools like Sora promise to make this process much cheaper. In the following example, a tourist board wanting to promote the Big Sur region of California could rent a drone to take aerial footage of the location, or they could use AI, saving time and money.
PROMPT: Drone view of waves crashing against the rugged cliffs along Big Sur’s garay point beach. The crashing blue waters create white-tipped waves, while the golden light of the setting sun illuminates the rocky shore. A small island with a lighthouse sits in the distance, and green shrubbery covers the cliff’s edge. The steep drop from the road down to the beach is a dramatic feat, with the cliff’s edges jutting out over the sea. This is a view that captures the raw beauty of the coast and the rugged landscape of the Pacific Coast Highway.
Prototyping and concept visualization
Even if AI video isn't used in a final product, it can be helpful for demonstrating ideas quickly. Filmmakers can use AI for mockups of scenes before they shoot them, and designers can create videos of products before they build them. In the following example, a toy company could generate an AI mockup of a new pirate ship toy before committing to creating them at scale.
PROMPT: Photorealistic closeup video of two pirate ships battling each other as they sail inside a cup of coffee.
Synthetic data generation
Synthetic data is often used for cases where privacy or feasibility concerns prevent real data from being used. For numeric data, common use cases are for financial data and personally identifiable information. Access to these datasets must be tightly controlled, but you can create synthetic data with similar properties to make available to the public.
One use of synthetic video data is for training computer vision systems. As I wrote in 2022, the US Air Force uses synthetic data to improve the performance of its computer vision systems for unmanned aerial vehicles to detect buildings and vehicles at nighttime and in bad weather. Tools such as Sora make this process much cheaper and more accessible for a wider audience.
Sora's Risks
The product is new, so the risks are not fully described yet, but they will likely be similar to those of text-to-image models.
Generation of harmful content
Without guardrails in place, Sora has the power to generate unsavory or inappropriate content, including videos containing violence, gore, sexually explicit material, derogatory depictions of groups of people, and other hate imagery, and promotion or glorification of illegal activities.
What constitutes inappropriate content varies a lot depending on the user (consider a child using Sora versus an adult) and the context of the video generation (a video warning about the dangers of fireworks could easily become gory in an educational way).
Misinformation and disinformation
Based on the example videos shared by OpenAI, one of Sora's strengths is its ability to create fantastical scenes that couldn't exist in real life. This strength also makes it possible to create "deepfake" videos where real people or situations are changed into something that isn't true.
When this content is presented as truth, either accidentally (misinformation) or deliberately (disinformation), it can cause problems.
As Eske Montoya Martinez van Egerschot, Chief AI Governance and Ethics Officer at DigiDiplomacy, wrote, "AI is reshaping campaign strategies, voter engagement, and the very fabric of electoral integrity."
Convincing-but-fake AI videos of politicians or adversaries of politicians have the power to "strategically disseminate false narratives and target legitimate sources with harassment, aiming to undermine confidence in public institutions and foster animosity towards various nations and groups of people".
In a year containing many important elections from Taiwan to India to the United States, this has widespread consequences.
Biases and stereotypes
The output of generative AI models is highly dependent on the data it was trained on. That means that cultural biases or stereotypes in the training data can result in the same issues in the resulting videos. As Joy Buolamwini discussed in the Fighting For Algorithmic Justice episode of DataFramed, biases in images can have severe consequences in hiring and policing.
How Can I Access Sora?
To access Sora, go to sora.com. At the moment of writing this article, Sora is available in most of the world except most of Europe and the UK.
Accessing Sora requires a subscription to either ChatGPT Plus or ChatGPT Pro. Both tiers offer users the ability to explore Sora’s advanced video generation tools, but there are key differences in features and limits:
Feature |
ChatGPT Plus |
ChatGPT Pro |
Price |
$20/month |
$200/month |
Video Generations |
Up to 50 priority videos (1,000 credits) |
Up to 500 priority videos (10,000 credits) + Unlimited relaxed videos |
Resolution & Duration |
Up to 720p, 5s duration |
Up to 1080p, 20s duration |
Concurrent Generations |
0 |
5 |
Watermark |
Download with watermark |
Download without watermark |
What Are the Alternatives to Sora?
There are several high-profile alternatives to Sora that allow users to create video content from text. These include:
- Runway-Gen-3. The highest-profile alternative to OpenAI Sora is Runway Gen-3. Like Sora, this is a text-to-video generative AI, and it is currently available on web and mobile.
- Lumiere. Google recently announced Lumiere, which is currently available as an extension to the PyTorch deep-learning Python framework.
- Make-a-Video. Meta announced Make-a-Video in 2022; again this is available via a PyTorch extension.
There are also several smaller competitors:
- Pictory simplifies the conversion of text into video content, targeting content marketers and educators with its video generation tools.
- Kapwing offers an online platform for creating videos from text, emphasizing ease of use for social media marketers and casual creators.
- Synthesia focuses on creating AI-powered video presentations from text, offering customizable avatar-led videos for business and educational purposes.
- HeyGen aims to simplify video production for product and content marketing, sales outreach, and education.
- Steve AI provides an AI platform that enables generation of videos and animation from Prompt to Video, Script to Video, and Audio to Video.
- Elai focuses on e-learning and corporate training, offering a solution to effortlessly turn instructional content into informative videos
Model/Platform |
Developer/Company |
Platform Availability |
Target Audience |
Key Features |
Runway Gen-3 |
Runway |
Web, Mobile |
Broad (General use) |
High-profile text-to-video AI, user-friendly |
Lumiere |
|
PyTorch Extension |
Developers, Researchers |
Advanced text-to-video generation for PyTorch users |
Make-a-Video |
Meta |
PyTorch Extension |
Creators, Researchers |
High-quality video generation from text |
Pictory |
Pictory |
Web |
Content Marketers, Educators |
Simplifies text to video conversion for engaging narratives |
Kapwing |
Kapwing |
Web |
Social Media Marketers, Casual Creators |
Platform for video creation from text |
Synthesia |
Synthesia |
Web |
Businesses, Educators |
AI-powered avatar-led video presentations from text |
HeyGen |
HeyGen |
Web |
Marketers, Educators |
Video generation for sales and marketing |
Steve AI |
Steve AI |
Web |
Businesses, individuals |
Create videos and animations for various applications |
Elai |
Elai |
Web |
E-learning, Corporate Training |
Turns instructional content into videos |
What Does OpenAI Sora Mean for the Future?
There can be little doubt that Sora is ground-breaking. It’s also clear that the potential for this generative model is vast. What are the implications of Sora on the AI industry and the world? We can, of course, only take educated guesses. However, here are some of the ways that Sora may change things, for better or worse.
Short-term implications of OpenAI Sora
Let’s first take a look at the direct, short-term impacts we might see from Sora in the wake of its (likely phased) launch to the public.
A wave of quick wins
In the section above, we’ve already explored some of Sora's potential use cases. Many of these will likely see quick adoption if and when Sora is released for public use. This might include:
- The proliferation of short-form videos for social media and advertising. Expect creators on X (formerly Twitter), TikTok, LinkedIn, and others to up the quality of their content with Sora productions.
- The adoption of Sora for prototyping. Whether it’s demonstrating new products or showcasing proposed architectural developments, Sora could become commonplace for pitching ideas.
- Improved data storytelling. Text-to-video generative AI could give us more vivid data visualization, better simulations of models, and interactive ways to explore and present data. That said, it will be important to see how Sora performs on these types of prompts.
- Better learning resources. With tools like Sora, learning materials could be greatly enhanced. Complicated concepts can be brought to life, while more visual learners have the chance for better learning aids.
A minefield of risks
Of course, as we highlighted previously, such tech comes with a swathe of potential negatives, and it’s imperative that we navigate them. Here are some of the risks we must be alert to:
- The spread of misinformation and disinformation. Collectively, we’ll have to be more discerning of the content we consume, and we’ll need better tools to spot that which is manufactured or manipulated. This is especially important in an election year.
- Copyright infringement. We’ll need to be mindful of how our images and likenesses are used. Legislation and controls may be required to prevent our personal data from being used in ways we’ve not consented to. This debate will most likely first play out as fans start creating videos based on their favorite film franchises—that said, the personal risks are also massive here.
- Regulatory and ethical challenges. The advances in generative AI are already proving difficult for regulators to keep up with, and Sora could exacerbate this issue. We must navigate the appropriate and fair use of Sora without impacting individual liberties or stifling innovation.
- Dependence on technology. Tools like Sora could be seen as a shortcut for many rather than an assistant. People may see it as a replacement for creativity, which could have implications for many industries and the professionals who work in them.
Generative video becomes the next frontier of competition
We’ve already mentioned a couple of alternatives to Sora, but we can expect this list to grow significantly in 2024 and beyond. As we saw with ChatGPT, there is an ever-growing list of alternatives vying for positions and many projects iterating on the open-source LLMs on the market.
Sora may well be the tool that continues to drive innovation and competition in the field of generative AI. Whether it’s through use-specific, fine-tuned models or proprietary tech that’s in direct competition, many of the big players in the industry will likely want a piece of the text-to-video action.
Long-term implications of OpenAI Sora
As the dust begins to settle after the public launch of OpenAI’s Sora, we’ll start to see what the longer-term future holds. As professionals across a host of industries get their hands on the tool, there’ll inevitably be some game-changing uses for Sora. Let’s speculate on what some of these could be:
High-value use cases can be unlocked
It’s possible that Sora (or similar tools) could become a mainstay in several industries:
- Advanced content creation. We could see Sora as a tool to speed up production across fields such as VR and AR, video games, and even traditional entertainment such as TV and movies. Even if it’s not used directly to create such media, it could help to prototype and storyboard ideas.
- Personalised entertainment. Of course, we could see an instance where Sora creates and curates content tailored specifically to the user. Interactive and responsive media that are tailored to an individual’s tastes and preferences could emerge.
- Personalised education. Again, this highly individualized content could find a home in the education sector, helping students learn in a way that’s best suited to their needs.
- Real-time video editing. Video content could be edited or re-produced in real-time to suit different audiences, adapting aspects such as tone, complexity, or even narrative based on viewer preferences or feedback.
The lines between the physical and digital worlds begin to blur
We’ve already touched on virtual reality (VR) and augmented reality (AR), but Sora has the potential to revolutionize how we interact with digital content when combined with these mediums. If future iterations of Sora are able to generate high-quality virtual worlds that can be inhabited within seconds—and leverage generative text & audio to populate it with seemingly real virtual characters—this raises serious questions about what it means to navigate the digital world in the future.
Closing Notes
OpenAI's Sora model promises a leap forward in the quality of generative video. The public release was long-awaited, and its potential applications across various sectors are highly anticipated. If you’re eager to get started in the world of generative AI, our AI Fundamentals skill track will help you get up to speed with machine learning, deep learning, NLP, generative models, and more.
For more resources on the latest in the world of AI, check out the list below:
Earn a Top AI Certification
FAQs
Is Sora available to the public?
Yes. At the moment of writing this article, Sora is available in most of the world except most of Europe and the UK.
How can I access Sora?
To access Sora, go to sora.com. At the moment of writing this article, Sora is available in most of the world except most of Europe and the UK.
Is Sora AI free?
No. Accessing Sora requires a subscription to either ChatGPT Plus or ChatGPT Pro.
How does Sora AI work?
Sora is a diffusion model. That means that it starts with each frame of the video consisting of static noise, and uses machine learning to gradually transform the images into something resembling the description in the prompt.
How Long Can Sora Videos Be?
Sora videos can be up to 20 seconds long for ChatGPT Pro users and 5s for ChatGPT Plus users.
What is the maximum resolution Sora videos can have?
OpenAI’s Sora model can generate videos with a maximum resolution of 1080p (1920×1080 pixels). ChatGPT Plus subscribers can create videos with a maximum resolution of 720p, while the maximum for ChatGPT Pro users is 1080p.
What is Sora Turbo?
Sora Turbo is the most advanced version of OpenAI's Sora.

Richie helps individuals and organizations get better at using data and AI. He's been a data scientist since before it was called data science, and has written two books and created many DataCamp courses on the subject. He is a host of the DataFramed podcast, and runs DataCamp's webinar program.

A senior editor in the AI and edtech space. Committed to exploring data and AI trends.
Start Your OpenAI Journey Today!
course
Introduction to Embeddings with the OpenAI API
course
ChatGPT Prompt Engineering for Developers
blog
Meta Movie Gen Guide: How It Works, Examples, Comparison

François Aubry
8 min

blog
OpenAI o1 Guide: How It Works, Use Cases, API & More
blog
Top 7 AI Video Generators for 2025 With Example Videos

Dr Ana Rojo-Echeburúa
9 min

blog
What Is OpenAI's O1 Pro Mode? Features, ChatGPT Pro & More
tutorial
OpenAI o1-preview Tutorial: Building a Machine Learning Project
tutorial
OpenAI Realtime API: A Guide With Examples

François Aubry
15 min