
OpenAI has introduced Operator, an autonomous AI agent designed to transform user prompts into real-world online tasks. Operator can execute a variety of requests using a web browser, including booking travel, making restaurant reservations, and purchasing concert tickets.
Currently, Operator is in the research preview stage, exclusively available to ChatGPT Pro subscribers. This tool utilizes OpenAI’s Computer-Using Agent (CUA) model, merging the computer vision capabilities of GPT-4 with training on graphical user interfaces (GUIs).
While Operator is similar to other AI agents like ByteDance’s UI-TARS, its unique feature is its ability to operate independently of APIs. OpenAI states, “Operator can ‘see’ (through screenshots) and ‘interact’ (using mouse and keyboard actions) with a browser, enabling web actions without requiring custom API integrations.”
Collaboration is underway with companies such as DoorDash, Instacart, OpenTable, and others to optimize services for Operator. However, results may vary when using non-optimized platforms.
As for effectiveness, OpenAI reports a 38.1% success rate on OSWorld benchmark tasks, a 58.1% rate on web-related tasks, and 87% on web-based tasks using WebVoyager. Users have noted that Operator might struggle with accuracy and speed, prompting some skepticism about its reliability for daily tasks.
OpenAI addresses safety concerns, stating, “We know bad actors may try to misuse this technology. That’s why we have designed Operator to refuse harmful requests and block disallowed content.”
These developments highlight the potential for AI agents to assist in routine online tasks, raising questions about their future capabilities and reliability.