Track
OpenAI recently announced Operator, an AI agent designed to handle web-based tasks on its own. It can handle tasks like booking a table or shopping online, simplifying digital interactions for everyday tasks.
However, we think its potential goes beyond convenience—it could empower people who lack computer skills by enabling them to complete tasks like filling out forms or navigating complex websites with ease.
Additionally, with further integration of voice commands, it could provide a more accessible solution for individuals with disabilities, such as those with visual impairments.
Operator enters a competitive field that includes Anthropic’s computer-use capabilities and Google’s Project Mariner. One difference is that Anthropic’s tools require programming knowledge (for now), whereas Operator allows users to provide instructions in plain language, making it more accessible.
In this blog, we’ll explain what Operator is, explore its core technology (CUA), outline its use cases and limitations, and discuss where it fits within the broader context of AI agents. We’ve also created this one-minute video for a quick overview:
What Is Operator?
Operator is OpenAI’s first AI agent, designed to autonomously perform tasks on the web. An AI agent is a system that can take instructions, reason through them, and execute actions without constant human oversight.
Unlike traditional automation tools that rely on predefined APIs or rigid workflows, Operator interacts directly with websites, mimicking human actions like clicking, typing, and scrolling. Its primary goal is to simplify digital tasks that might otherwise require manual effort or technical expertise.
This makes it well-suited for everyday activities like managing reservations or filling out forms, as well as for more complex, multi-step workflows. Here’s an example of using Operator:
Operator uses a virtual browser to navigate websites. This virtual environment enables it to interact with graphical user interfaces (GUIs) like a human user would. Instead of requiring websites to have specialized APIs, Operator interprets the visual layout of a webpage, clicks buttons, types in fields, and scrolls through content.
Operator relies on plain-language instructions to understand what users need. Once the task is set, it processes the instructions, breaks them into actionable steps, and executes them while providing feedback to the user. Operator can also ask for clarification or confirmations for critical actions, such as submitting a form or completing a payment, ensuring greater control over its output.
What Is Computer-Using Agent (CUA)?
The Computer-Using Agent (CUA) is the core technology powering Operator. Combining GPT-4o’s vision capabilities with advanced reasoning through reinforcement learning, CUA is trained to interact with graphical user interfaces—the buttons, menus, and text fields people see on a screen.
Perception
CUA begins by processing raw pixel data from screenshots of the screen. It uses this visual information to identify key interface elements such as buttons, input fields, and navigation menus.
Reasoning
Once the visual data is analyzed, CUA applies chain-of-thought reasoning to plan its actions. By integrating current and past screenshots, it evaluates its observations, breaks tasks into smaller steps, and adapts dynamically to challenges. For example, if a pop-up appears during a task (like the ad we’ve seen in the example above), CUA can adjust its approach and find a way to continue, much like a human user would.
Action
CUA uses virtual mouse and keyboard inputs to perform actions such as clicking, typing, scrolling, and submitting forms. This functionality enables it to execute tasks autonomously, whether it’s selecting an item from a dropdown menu or navigating through a multi-step form.
For critical actions—such as making payments or logging into accounts—CUA seeks user confirmation before proceeding, ensuring users maintain control over sensitive operations.
CUA Benchmarks
CUA has achieved state-of-the-art (SOTA) performance on several benchmarks:
Benchmark Type | Benchmark | Computer Use (Universal Interface) | Web Browsing Agents | Human | |
---|---|---|---|---|---|
OpenAI CUA | Previous SOTA | Previous SOTA | |||
Computer Use | OSWorld | 38.1% | 22.0% | 72.4% | |
Browser Use | WebArena | 58.1% | 36.2% | 57.1% | 78.2% |
WebVoyager | 87.0% | 56.0% | 87.0% |
Source: OpenAI
Let’s break down what each of these three benchmarks does:
- OSWorld (38.1%): Assesses the ability to perform tasks in full operating systems like Ubuntu, Windows, and macOS. Although CUA outperforms previous models, its success rate is still below the human benchmark of 72.4%.
- WebArena (58.1%): Evaluates the performance in navigating simulated websites, including e-commerce and social platforms. While it surpasses prior models, it has room for improvement in handling complex, multi-step interactions.
- WebVoyager (87%): Measures the effectiveness on live websites like Amazon, GitHub, and Google Maps. CUA performs strongly here, as tasks tend to be simpler and more structured compared to WebArena.
The graph below illustrates the performance of OpenAI’s CUA compared to Claude 3.5 Sonnet on the OSWorld benchmark. The x-axis represents the maximum number of steps allowed for task completion, while the y-axis shows the success rate as a percentage. CUA demonstrates steady improvement with more steps allowed, outperforming previous state-of-the-art models.
Source: OpenAI
How to Access Operator
Operator is currently available in the United States as part of a research preview for Pro users of ChatGPT. To access it, you need an active Pro subscription. You can visit operator.chatgpt.com to start using Operator.
For now, Operator is limited to Pro users, but OpenAI has plans to expand access to Plus users in the coming months. The rollout strategy allows OpenAI to gather feedback and improve the system before offering it to a wider audience.
While Operator is focused on U.S. users during the initial launch, OpenAI has stated that accessibility in Europe and other regions will take longer due to regulatory challenges. Users in these regions will need to wait for future updates as OpenAI works to navigate these complexities.
Looking ahead, OpenAI also plans to make the underlying technology behind Operator, known as CUA, available through an API. This would enable developers to create their own AI-powered agents for custom applications.
Operator’s Use Cases
The demo examples for Operator—such as booking a table or shopping online—are functional, but to us, they don’t feel particularly practical. It’s often faster and easier to perform these tasks manually rather than spending time monitoring an AI’s execution.
However, Operator’s potential becomes clearer when you think beyond these use cases, focusing on accessibility or institutional support.
Accessibility
One of the most impactful areas where Operator could shine is in accessibility. For individuals with limited computer skills, such as the elderly or those new to technology, Operator could act as a guide, helping them navigate complex online tasks without needing prior expertise.
Imagine if this was combined with voice commands—users wouldn’t even need to type a prompt, making the tool even more intuitive.
Similarly, for individuals with disabilities, like those with visual impairments, Operator could help them interact with websites that might otherwise be inaccessible, especially if paired with audio feedback or screen-reader support.
Institutional support
Operator has strong potential in government and institutional settings. It could assist citizens in filling out complex forms for tasks like applying for visas, filing taxes, or accessing social benefits. This would reduce the reliance on in-person assistance and improve processes for both users and institutions.
In education, Operator could simplify online application systems, scholarship submissions, and research tasks, enabling students or those with limited digital literacy to navigate these processes more effectively.
Small businesses and professional tasks
In the workplace, Operator could be valuable for small businesses by automating repetitive web-based tasks such as managing inventory, processing online orders, or collecting customer feedback. For professionals, it could handle tedious workflows, like gathering information from multiple sources or completing forms, freeing up time for more strategic work.
Healthcare and non-profits
Healthcare and non-profits could benefit significantly from Operator. Clinics could use it to help patients complete online registration forms or access resources without requiring extensive staff involvement.
Non-profits operating in regions with low digital literacy might deploy Operator to help underserved populations navigate essential online systems, ensuring that technological barriers don’t limit access to vital services.
Competition of AI Agents
OpenAI’s Operator enters the space of AI agents alongside Anthropic’s computer-use capabilities and Google’s Project Mariner.
Anthropic’s computer use
Anthropic’s computer use, powered by its Claude 3.5 Sonnet model, allows the AI to interact with desktop environments by simulating human actions like clicking, typing, and navigating. Currently, this feature requires some technical knowledge to set up and use effectively via the API, limiting its accessibility for non-technical users.
In contrast, Operator’s plain-language interface eliminates the need for programming knowledge, making it more user-friendly for a wider audience. However, Anthropic will almost surely work toward simplifying its tools to compete more directly with Operator’s accessible design.
Google’s Project Mariner
Project Mariner, developed by Google’s DeepMind, is an experimental agent designed to navigate and interact with web pages autonomously. While still in its research phase, Mariner is being tested with a small group of users, and its integration within Google’s ecosystem suggests it could excel in workflows involving Gmail, Google Docs, and other Google services.
Conclusion
Operator is OpenAI’s first step into the competitive field of AI agents, offering a unique approach with its plain-language interface and universal browser-based design. While tools like Anthropic’s computer use and Google’s Project Mariner bring their own strengths, Operator’s focus on accessibility sets it apart for now.
We’re also curious about the potential for other players, like DeepSeek or Meta, to join the competition. 2025 might actually live up to its hype and be the year of agentic AI.
Introduction to AI Agents
FAQs
Can OpenAI Operator handle more than one task at the same time?
Yes, Operator is designed to manage multiple tasks simultaneously. You can have separate conversations for each task, and Operator executes in parallel. For example, you can have Operator order groceries on Instacart while also making a booking on Booking.com.
Is OpenAI Operator an AI agent?
Yes, OpenAI Operator is an AI agent designed to autonomously perform tasks for you. It interacts with websites by navigating, clicking, and filling out forms, allowing you to automate activities. Learn more about AI agents with our blog post: Understanding AI Agents: The Future of Autonomous Systems.
How does Operator work?
Powered by the Computer-Using Agent (CUA) model, Operator interacts with web pages by viewing screenshots and performing mouse and keyboard actions. It can self-correct or request user help when needed.
Who can use Operator right now, and how can they get started?
Operator is available to Pro users in the U.S. If you have a Pro subscription, you can visit operator.chatgpt.com to start. Try describing a task, and Operator will handle it.
What are the current limitations of Operator?
Because Operator is still a research preview, it might struggle with complex tasks like creating slideshows or managing calendars.
Will Operator be available on mobile devices?
There is no confirmation yet about mobile support for Operator, but its ability to interact with web interfaces could make it adaptable to mobile platforms in the future as the technology develops.
How does Operator compare to voice assistants like Siri or Google Assistant?
Operator focuses on web-based tasks and directly interacting with websites, while traditional voice assistants typically rely on predefined app integrations or APIs. Operator’s ability to mimic human actions like clicking and scrolling sets it apart in terms of versatility for complex online tasks.
Can Operator handle websites that use CAPTCHA or advanced security features?
Operator currently relies on user input for tasks involving CAPTCHAs or sensitive logins. It does not bypass these systems automatically but can navigate workflows once such barriers are resolved.
I’m an editor and writer covering AI blogs, tutorials, and news, ensuring everything fits a strong content strategy and SEO best practices. I’ve written data science courses on Python, statistics, probability, and data visualization. I’ve also published an award-winning novel and spend my free time on screenwriting and film directing.

I'm a data science writer and editor with contributions to research articles in scientific journals. I'm especially interested in linear algebra, statistics, R, and the like. I also play a fair amount of chess!