Reading view

There are new articles available, click to refresh the page.

OpenAI says prompt injection may never be ‘solved’ for browser agents like Atlas

By: Greg Otto

30 December 2025 at 10:32

OpenAI is warning that prompt injection, a technique that hides malicious instructions inside ordinary online content, is becoming a central security risk for AI agents designed to operate inside a web browser and carry out tasks for users.

The company said it recently shipped a security update for ChatGPT Atlas after internal automated red-teaming uncovered what it described as a new class of prompt-injection attacks. The update included a newly adversarially trained model along with strengthened safeguards around it, OpenAI said.

OpenAI’s description of Atlas emphasizes that, in agent mode, the browser agent views webpages and uses clicks and keystrokes “just as you would,” letting it work across routine workflows using the same context and data a person would have. That convenience also raises risk. A tool with access to email, documents and web services can become a higher-value target than a chatbot that only answers questions.

“As the browser agent helps you get more done, it also becomes a higher-value target of adversarial attacks,” the company wrote in a blog post. “This makes AI security especially important. Long before we launched ChatGPT Atlas, we’ve been continuously building and hardening defenses against emerging threats that specifically target this new ‘agent in the browser’ paradigm. Prompt injection⁠ is one of the most significant risks we actively defend against to help ensure ChatGPT Atlas can operate securely on your behalf.”

To find weaknesses before they appear outside the company, OpenAI said it built an automated attacker using large language models and trained it with reinforcement learning. The goal was to discover prompt-injection strategies that could push a browser agent into carrying out complex harmful workflows that unfold over many steps, rather than simpler failures such as generating a particular string of text or triggering a single unintended tool call.

OpenAI detailed in the blog post that its automated attacker can iterate on injections by sending them to a simulator that runs a “counterfactual rollout” of how the target agent would behave if it encountered the malicious content. The simulator returns a full trace of the victim agent’s reasoning and actions, which the attacker uses as feedback to refine the attack through multiple rounds before settling on a final version.

OpenAI said having internal access to the agent’s reasoning gives it an edge that could help it stay ahead of attackers.

A demonstration described by the company shows how prompt injection could surface during ordinary work. In the scenario, the automated attacker plants a malicious email in a user’s inbox containing instructions directing the agent to send a resignation letter to the user’s boss. When the user later asks the agent to draft an out-of-office reply, the agent encounters the malicious email during the workflow, treats the injected prompt as authoritative, and sends the resignation message instead of writing the requested out-of-office note.

While hypothetical, the example illustrates how letting an agent handle tasks changes the nature of online risk. Content that would traditionally attempt to persuade a person to act is reframed as content that tries to command the agent already empowered to act.

OpenAI is not alone in treating prompt injection as a persistent problem. The U.K. National Cyber Security Centre warned earlier this month that prompt-injection attacks against generative AI applications may never be fully mitigated, advising organizations to focus on reducing risk and limiting impact.

The company’s attention to prompt injection is also arriving as it seeks to fill a senior “Head of Preparedness” role intended to study and plan for emerging AI-related risks, including in cybersecurity.

In a post on X, CEO Sam Altman said AI models are starting to present “real challenges,” citing potential impacts on mental health and systems that are becoming capable enough in computer security to find critical vulnerabilities. OpenAI announced a preparedness team in 2023 to examine risks ranging from immediate threats, such as phishing, to more speculative catastrophic scenarios. Since then, leadership changes and departures among safety-focused staff have drawn scrutiny.

“We have a strong foundation of measuring growing capabilities, but we are entering a world where we need more nuanced understanding and measurement of how those capabilities could be abused, and how we can limit those downsides both in our products and in the world, in a way that lets us all enjoy the tremendous benefits,” Altman wrote. “These questions are hard and there is little precedent; a lot of ideas that sound good have some real edge cases.”

The post OpenAI says prompt injection may never be ‘solved’ for browser agents like Atlas appeared first on CyberScoop.

How to determine if agentic AI browsers are safe enough for your enterprise

CyberScoop

By: Greg Otto

23 December 2025 at 07:00

Agentic AI browsers like OpenAI’s Atlas have debuted to major fanfare, and the enthusiasm is warranted. These tools automate web browsing to close the gap between what you want to accomplish and getting it done. Rather than manually opening multiple tabs, you can simply tell the browser what you need. Ask it to file a competitor brief, filling out a form, or schedule a meeting, and it will handle the task while you watch.

But with this evolution comes a stark reality: agentic browsers expand the enterprise attack surface in unprecedented ways. As the web shifts from something we browse to something that acts on our behalf, the stakes get higher. Agentic AI browsers are no longer passive tools. They take initiative, operate on our behalf, and in some cases, act with administrative privilege. That represents a seismic shift in trust and risk.

The browsing revolution: From reader to actor

Agentic AI is an execution model. It interprets a user’s intent, plans a series of actions, and executes them autonomously across websites. Over the past few months, I’ve tested several agentic browsers (Atlas, Comet, Dia, Surf, and Fellou) extensively and conducted limited testing with others (Neon and Genspark).

Each browser represents a distinct approach to the same fundamental challenge: how to eliminate constant tab-switching and let users complete tasks in one place. Atlas, built on ChatGPT, emphasizes supervised actions within a browsing sandbox. Comet prioritizes “research velocity,” using coordinating agents across multiple tabs to gather information faster. Neon offers a comprehensive browser automation experience with the option to run it on your own machine. Genspark and Fellou are designed to take more actions with less human oversight.

Yet as these tools grow more capable, they grow correspondingly more dangerous.

The hidden security threats

Conventional browser security measures, like TLS encryption and endpoint protection, weren’t designed to handle the risk that AI agents create. These tools introduce several significant new attack vectors. These include:

Indirect Prompt Injection: Malicious instructions can be embedded in websites in ways invisible to the user. The agent, tasked with interpreting and acting on content, may misinterpret these cues as legitimate directives. Imagine a rogue blog post containing hidden HTML that causes your agent to email internal documents to an attacker. If the browser agent treats that action as part of the task flow, damage can be done before any human intervenes.

Clipboard and Credential Artifacts: Some agents interact with your clipboard or browser session to perform actions. If the agent can access sensitive tokens or passwords, particularly without clear logs or approval workflows, an attacker could manipulate this access through crafted web content.

Opaque Execution Flows: Many of these browsers operate with black-box agents. Without fine-grained logs, rollback capabilities, or sandboxing, users often remain unaware of what the agent is doing in the background until it’s too late. Comet, for instance, offers impressive speed but has demonstrated vulnerabilities to prompt injection and credential misuse.

Over-Privileged Automation: It’s tempting to let the AI agent access everything, especially when tasks involve multiple sites, accounts, and tools. But granting such control without granular permissions or approval checkpoints opens the door to lateral movement attacks—where a compromised agent becomes a gateway to your broader systems.

Without clear guardrails like scoped permissions, transparent logs, and sandboxing, these tools can unintentionally execute malicious or unauthorized actions on behalf of the user.

Governance isn’t optional

Enterprise buyers must stop thinking of governance as a secondary concern. The most secure tools are those that limit what agents can do.

Atlas, for example, confines actions to a supervised mode (“Watch Mode”) for sensitive sites, requiring active oversight before anything consequential happens. Neon executes actions locally in the user’s session, avoiding the transfer of credentials to a cloud agent. Surf (now open source) and Dia (recently acquired by Atlassian) don’t let agents take actions independently, limiting the attack surface.

Genspark and Fellou, on the other hand, promise sweeping autonomy. Their security profiles reflect that ambition, with user reviews calling out instability, unverifiable claims, and the need for sandboxed, staged rollouts.

Practical advice for enterprise leaders

For enterprises interested in these new browsers but concerned about security, the answer is simple: start narrow. Begin with a few, well-defined workflows rather than deploying agents across the organization. Choose three specific tasks, like drafting a competitor brief, reviewing vendor RFPs, or arranging travel. Then track key metrics: speed of completion, frequency of mistakes, and quality of results.

Next, apply enterprise-grade controls. These include:

Requiring approval for each action when the agent sends messages, emails, or makes purchases.
Using role-based access to limit what agents can touch.
Keeping critical systems (e.g., HRIS, financial tools, source code repositories) completely out of scope.
Insisting on transparent logs that record each action taken by the agent and the input that triggered it.

It’s equally critical to train your users. Even basic training on how to write good prompts makes a big difference. Help teams understand how agents interpret language, how prompt injection works, and how to spot suspicious outputs.

Most importantly, don’t bet everything on one browser. Instead, choose an agent that operates with more independence (like Comet or Atlas) for low-risk workflows, and pair it with a more guided tool (like Dia) for employees who need support but not full automation.

A measured optimism

Despite the risks, I remain optimistic. The shift to agentic browsing is fundamentally reshaping how we work. Applied correctly and judiciously, these tools will save time, reduce friction, and help users unlock insights faster than ever before.

But we cannot afford to conflate novelty and safety. The burden is on vendors to bake in controls, not bolt them on, and on enterprises to pilot thoughtfully, not plunge ahead. We’ve seen this pattern previously with browser extensions, mobile apps, and cloud-first tools. Those who approached with healthy skepticism and robust guardrails were the ones who reaped the benefits without the breaches. Agentic AI will be no different.

Shanti Greene is head of data science and AI innovation at AnswerRocket.

The post How to determine if agentic AI browsers are safe enough for your enterprise appeared first on CyberScoop.