CyberScoop
US government, allies publish guidance on how to safely deploy AI agents 1 May 2026 at 12:49

US government, allies publish guidance on how to safely deploy AI agents

By: Greg Otto

1 May 2026 at 12:49

Cybersecurity agencies from the United States, Australia, Canada, New Zealand and the United Kingdom jointly published guidance Friday urging organizations to treat autonomous artificial intelligence systems as a core cybersecurity concern, warning that the technology is already being deployed in critical infrastructure and defense sectors with insufficient safeguards.

The guidance focuses on agentic AI — software built on large language models that can plan, make decisions and take actions autonomously. In order for this software to function it needs to connect to external tools, databases, memory stores and automated workflows, allowing it to execute multi-step tasks without human review at each stage.

The guidance was co-authored by the U.S. Cybersecurity and Infrastructure Security Agency, the National Security Agency, the Australian Signals Directorate’s Australian Cyber Security Centre, the Canadian Centre for Cyber Security, New Zealand’s National Cyber Security Centre and the United Kingdom’s National Cyber Security Centre.

The agencies’ central message is that agentic AI does not require an entirely new security discipline. Organizations should fold these systems into the cybersecurity frameworks and governance structures they already maintain, applying established principles such as zero trust, defense-in-depth and least-privilege access.

The document identifies five broad categories of risk. The first is privilege: When agents are granted too much access, a single compromise can cause far more damage than a typical software vulnerability. The second covers design and configuration flaws, where poor setup creates security gaps before a system even goes live.

The third category covers behavioral risks, or cases where an agent pursues a goal in ways its designers never intended or predicted. The fourth is structural risk, where interconnected networks of agents can trigger failures that spread across an organization’s systems.

The fifth category is accountability. Agentic systems make decisions through processes that are difficult to inspect and generate logs that are hard to parse, making it difficult to trace what went wrong and why. The agencies also note that when these systems fail, the consequences can be concrete: altered files, changed access controls and deleted audit trails.

The guidance also flags prompt injection, where instructions embedded inside data can hijack an agent’s behavior to perform malicious tasks. Prompt injection has been a lingering problem with large language models, with some companies admitting that the problem may never be solved.

Identity management gets significant attention throughout the document. The agencies recommend that each agent carry a verified, cryptographically secured identity, use short-lived credentials and encrypt all communications with other agents and services. For high-impact actions, a human should have to sign off, and the guidance is explicit that deciding which actions require that approval is a job for system designers, not the agent.

The agencies admit the security field has not fully caught up with agentic AI. Some risks unique to these systems are not yet covered by existing frameworks, and the guidance calls for more research and collaboration as the technology takes on a growing number of operational roles.

“Until security practices, evaluation methods and standards mature, organisations should assume that agentic AI systems may behave unexpectedly and plan deployments accordingly, prioritising resilience, reversibility and risk containment over efficiency gains,” the guidance reads.

You can read the full guidance below.

CAREFUL ADOPTION OF AGENTIC AI SERVICES_FINAL Download

The post US government, allies publish guidance on how to safely deploy AI agents appeared first on CyberScoop.

SecurityWeek RSS Feed
Malicious AI Prompt Injection Attacks Increasing, but Sophistication Still Low: Google 27 April 2026 at 08:08

Malicious AI Prompt Injection Attacks Increasing, but Sophistication Still Low: Google

SecurityWeek RSS Feed

By: Eduard Kovacs

27 April 2026 at 08:08

The tech giant found that many indirect prompt injection attempts are harmless, but some malicious exploits have also been identified.

The post Malicious AI Prompt Injection Attacks Increasing, but Sophistication Still Low: Google appeared first on SecurityWeek.

CyberScoop
‘GrafanaGhost’ bypasses Grafana’s AI defenses without leaving a trace 7 April 2026 at 09:44

‘GrafanaGhost’ bypasses Grafana’s AI defenses without leaving a trace

CyberScoop

By: Greg Otto

7 April 2026 at 09:44

Security researchers at Noma Security have disclosed a new vulnerability they are calling GrafanaGhost, an exploit capable of silently stealing sensitive data from Grafana environments by chaining multiple security bypasses, including a method that circumvents the platform’s AI model guardrails without requiring any user interaction.

Grafana is widely deployed across enterprise organizations as a central hub for observability and data monitoring, typically housing real-time financial metrics, infrastructure health data, private customer records, and operational telemetry, among other uses. That concentration of sensitive information is what makes the platform a significant target. GrafanaGhost exploits how Grafana’s AI components process user-controlled input to bridge the gap between a private data environment and an external attacker-controlled server.

The attack requires no login credentials and does not depend on a user clicking a malicious link. It begins when an attacker crafts a specific URL path using query parameters originating outside the victim organization’s environment. Because Grafana handles entry logs, an attacker can gain access to an enterprise environment to which they have no legitimate connection. The attacker then injects hidden instructions that Grafana’s AI processes — a tactic known as prompt injection — using specific keywords to cause the model to ignore its own guardrails.

Grafana has built-in protections designed to prevent prompt injection, but Noma’s researchers found a flaw in the logic underlying that protection — one that could be exploited by formatting a web address in a way that Grafana’s security check misread as safe, while the browser treated it as a request to an external server the attacker controlled. The gap between what the security check believed it was allowing and what actually happened was enough to open the door for the attack.

The final obstacle was the AI model’s own instinct for self-defense. When researchers first attempted to slip malicious instructions past it, the model recognized the pattern and refused. After further study of how the model processed different types of input, they found a specific keyword that caused it to stand down, treating what was effectively an attack instruction as a routine and legitimate request.

With all three bypasses in place, the attack runs on its own. The AI processes the malicious instruction, attempts to load an image from the attacker’s server, and in doing so quietly carries the victim’s sensitive data along with that request in an image tag. The data is gone before anyone in the organization knows a request was ever made.

Noma’s researchers noted that multiple security layers were present in Grafana’s implementation, but each contained its own exploitable weakness. The domain validation logic, the AI model guardrails, and the content security controls all failed when approached in sequence.

Because the exploit is triggered by indirect prompt injection rather than a suspicious link or an obvious intrusion, there is nothing for a user to notice, no access-denied error for an administrator to find, and no anomalous event for a security team to investigate. To a data team, a DevSecOps engineer, or a CISO, the activity is indistinguishable from routine processes.

“The payload sits inside what looks like a legitimate external data source. The exfiltration happens through a channel the AI itself initiates, which looks like normal AI behavior to any observer. Traditional SIEM rules, DLP tools, and endpoint monitoring aren’t designed to interrogate whether an AI’s outbound call was instructed by a user or by an injected prompt,” Sasi Levi, vulnerability research lead at Noma Labs, told CyberScoop. “Without runtime protection that understands AI-specific behavior, monitoring what the model was asked, what it retrieved, and what actions it took, this attack would be effectively invisible.”

The attack is another example of a broader shift in how adversaries are approaching enterprise environments that have integrated AI-assisted features. Rather than exploiting broken application code in the traditional sense, attackers are increasingly targeting weak AI security surfaces and indirect prompt injection methods that allow them to access and extract critical data assets while remaining entirely invisible to the security teams responsible for protecting them.

Noma has found similar issues over the past year, with Levi telling CyberScoop that researchers keep seeing the same fundamental gap: AI features are being bolted onto platforms that were never designed with AI-specific threat models in mind.

“The attack surface isn’t a misconfigured firewall or an unpatched library, rather it is the weaponization of the AI’s own reasoning and retrieval behavior. These platforms trust the content they ingest far too implicitly,” Levi said.

The research is another example of how attackers can weaponize AI in a manner that current defenses cannot keep up with, making it extremely difficult for defenders to keep pace.

“Offensive researchers and, increasingly, sophisticated threat actors are well ahead of most enterprise defenders on this,” Levi said. “The frameworks, detection signatures, and incident response playbooks for AI-native attacks simply don’t exist at scale yet. What gives us some optimism is that awareness is growing quickly, but awareness and readiness are very different things.”

Grafana Labs was notified through responsible disclosure protocols, worked with Noma to validate the findings, and issued a fix.

However, Joe McManus, CISO at Grafana Labs, told CyberScoop the company disputes “the claim that this finding constitutes either a ‘zero-click’ attack or that it could operate silently, autonomously, or in the background.”

“Any successful execution of this exploit would have required significant user interaction: specifically, the end user would have to repeatedly instruct our AI assistant to follow malicious instructions contained in logs, even after the AI assistant made the user aware of the malicious instructions,” McManus told CyberScoop via email. “We emphasize that there is no evidence of this bug having been exploited in the wild, and no data was leaked from Grafana Cloud.”

Update: April 7, 12:43 p.m.: This story has been updated with comment from Grafana.

The post ‘GrafanaGhost’ bypasses Grafana’s AI defenses without leaving a trace appeared first on CyberScoop.

SecurityWeek RSS Feed
OpenAI Launches Bug Bounty Program for Abuse and Safety Risks 27 March 2026 at 09:33

OpenAI Launches Bug Bounty Program for Abuse and Safety Risks

SecurityWeek RSS Feed

By: Ionut Arghire

27 March 2026 at 09:33

Through the new program, OpenAI will reward reports covering design or implementation issues leading to material harm.

The post OpenAI Launches Bug Bounty Program for Abuse and Safety Risks appeared first on SecurityWeek.

CyberScoop
Researchers discover suite of agentic AI browser vulnerabilities 3 March 2026 at 15:58

Researchers discover suite of agentic AI browser vulnerabilities

CyberScoop

By: djohnson

3 March 2026 at 15:58

Researchers have discovered multiple vulnerabilities that let attackers to quietly hijack agentic AI browsers.

Researchers at Zenity Labs discovered these flaws, which affected multiple AI browsers, including Perplexity’s Comet. Before being patched, an attacker could exploit them via a legitimate calendar invite, using a prompt injection to force the AI browser to act against its user.

“These issues do not target a single application bug,” Stav Cohen, senior AI security researcher at Zenity Labs, wrote in a blog published Tuesday. “They exploit the execution model and trust boundaries of AI agents, allowing attacker controlled content to trigger autonomous behavior across connected tools and workflows.”

Prompt injection and AI hijacking attacks work because many agentic browsers can’t differentiate between instructions given by users and any outside content they ingest. Essentially, any webpage or email the browser encounters, if phrased the right way, could be interpreted as a straightforward prompt instruction.

By seeding the calendar invite with malicious prompts, the browser can be directed to access local file systems, browse directories, open and read files, and exfiltrate data to a third-party server. No malware or special access is required, only that the user accept the invite so the browser performs “each step as part of what it believes is a legitimate task delegated by the user.”

“Comet follows its normal execution model and operates within its intended capabilities,” Cohen wrote. “The agent is persuaded that what the user actually asked for is what the attacker desires.”

The potential damage doesn’t stop there. Another vulnerability allowed an attacker to use similar indirect prompting techniques to have Comet take over a user’s password manager. If a user is already signed in to the service, the agentic browser also has full access, and can silently change settings and passwords or extract secrets while the user receives “benign” outputs.

According to Zenity, the vulnerabilities were reported to Perplexity last year, with a fix issued in February 2026.

Prompt injection attacks remain one of the biggest ongoing challenges to integrating AI into organizations’ technology stacks, because eliminating these flaws entirely may be impossible. : OpenAI said in December that such vulnerabilities are “unlikely to ever” be fully solved in agentic browsers, though the company said the overall dangers could be reduced through automated attack discovery, adversarial training and new “system level safeguards.”

Cohen notes that with traditional browsers, local file access and other sensitive tasks can only be obtained with explicit user permission. But agentic browsers have far more autonomy to infer whether that access is necessary to carry out the user’s request, and take action without user input. While researchers used calendar invites to deliver the malicious prompts, the same technique can be deployed through nearly any form of written content.

“Once that decision is delegated, access to sensitive resources depends on the agent’s interpretation of intent rather than on an explicit user action,” he wrote. “At that point, the separation between user intent and agent execution becomes a security-critical concern.”

The post Researchers discover suite of agentic AI browser vulnerabilities appeared first on CyberScoop.

CyberScoop
Proofpoint acquires Acuvity to tackle the security risks of agentic AI 12 February 2026 at 19:04

Proofpoint acquires Acuvity to tackle the security risks of agentic AI

CyberScoop

By: Greg Otto

12 February 2026 at 19:04

Proofpoint announced Thursday it has acquired Acuvity, an AI security startup, as the cybersecurity company moves to address security risks stemming from widespread corporate adoption of agentic AI.

The acquisition strengthens Proofpoint‘s capabilities in monitoring and securing AI-powered systems that are increasingly handling sensitive business functions across enterprises.

Financial terms of the deal were not disclosed, but Ryan Kalember, Proofpoint’s chief strategy officer, told CyberScoop that the acquisition was beyond a pure “technology acquisition,” with Acuvity’s engineering team slated to join the California-based company.

Acuvity specializes in visibility and governance for AI applications, including the ability to track how employees and automated systems interact with external AI services and protect custom AI models developed within organizations. The startup’s platform monitors AI usage across multiple deployments, from web browsers to specialized infrastructure including Model Context Protocol (MCP) servers and locally installed AI tools.

The deal reflects growing concern among enterprises about security gaps created as organizations deploy agentic AI across departments, like software development, customer support, finance, and legal operations. These systems increasingly access sensitive data and execute tasks previously handled exclusively by humans.

Additionally, AI-specific attack vectors such as prompt injection and model manipulation have emerged as potential threats that traditional cybersecurity tools were not designed to address.

Kalember said CISOs are seeing the potential risk combined with agentic AI growth, and are sensing the need to maintain governance without impeding innovation, particularly as the pace of AI adoption has outstripped many companies’ ability to secure these systems effectively.

“It has definitely been a pivot from, ‘I got to be able to stop prompt injection’ to ‘I have to be able to figure out what the AI is even doing,’” he told CyberScoop.

Last May, Proofpoint acquired Hornetsecurity Group, a Germany-based provider of Microsoft 365 security services, in a deal reportedly valued at more than $1 billion. Kalember told CyberScoop he sees Acuvity helping small- and medium-sized organizations that leverage Hornetsecurity’s offerings to boost its AI security.

“That is going to be a world in which, independent of the size of the organization, they are going to very much leverage AI, and some of that will be built into the tools like M365 that is tightly coupled with the Hornetsecurity architecture,” Kalember said.

The acquisition follows a theme within the industry where larger security companies are buying AI-focused security startups. Just last week, data security firm Varonis acquired AI security firm AllTrue.ai for $150 million.

The post Proofpoint acquires Acuvity to tackle the security risks of agentic AI appeared first on CyberScoop.

CyberScoop
ServiceNow patches critical AI platform flaw that could allow user impersonation 13 January 2026 at 10:47

ServiceNow patches critical AI platform flaw that could allow user impersonation

CyberScoop

By: Greg Otto

13 January 2026 at 10:47

ServiceNow has addressed a critical security vulnerability in its AI platform that could have allowed unauthenticated users to impersonate legitimate users and perform unauthorized actions, the company disclosed Monday.

The flaw, designated CVE-2025-12420 and carrying a severity score of 9.3 out of 10, was discovered by SaaS security firm AppOmni in October. ServiceNow deployed fixes to most hosted instances on Oct. 30, 2025, and provided patches to partners and self-hosted customers. The company said it has no evidence the vulnerability was exploited before the fix.

The vulnerability affected Now Assist AI Agents and Virtual Agent API components. Customers using affected versions were advised to upgrade to patched releases, which include Now Assist AI Agents version 5.1.18 or later and 5.2.19 or later, and Virtual Agent API version 3.15.2 or later and 4.0.4 or later.

The disclosure arrives as security researchers raise broader questions about the configuration and deployment of enterprise AI systems. AppOmni’s research, which led to the vulnerability discovery, also revealed that default settings in ServiceNow’s Now Assist platform could enable second-order prompt injection attacks, a sophisticated exploit method that manipulates AI agents through data they process rather than direct user input.

These attacks exploit a feature called agent discovery, which allows AI agents to communicate with each other to complete complex tasks. While designed to enhance functionality, the feature creates potential attack vectors when agents are improperly configured or grouped together without adequate controls.

In testing scenarios, researchers demonstrated that low-privileged users could embed malicious instructions in data fields that higher-privileged users’ AI agents would later process. The compromised agent could then recruit other more powerful agents to execute unauthorized actions, including accessing restricted records, modifying data, and potentially escalating user privileges.

The attacks succeeded even with ServiceNow’s prompt injection protection feature enabled, highlighting how configuration choices can undermine security controls embedded in the AI systems themselves. The researchers found that default settings automatically grouped agents into teams and marked them as discoverable, creating unintended collaboration pathways that attackers could exploit.

The research underscores a fundamental challenge in enterprise AI deployment: security depends not only on the underlying technology but also on how organizations configure and manage these systems. ServiceNow confirmed the behaviors identified by researchers were intentional design choices and updated its documentation to clarify configuration options.

Organizations using ServiceNow’s AI platform face the task of balancing autonomous agent capabilities against security risks. The research suggests several mitigation strategies, including requiring human supervision for agents with powerful capabilities, segmenting agents into isolated teams based on their functions, and monitoring agent behavior for deviations from expected patterns.

You can find more information on the vulnerability on ServiceNow’s website.

The post ServiceNow patches critical AI platform flaw that could allow user impersonation appeared first on CyberScoop.

CyberScoop
OpenAI says prompt injection may never be ‘solved’ for browser agents like Atlas 30 December 2025 at 10:32

OpenAI says prompt injection may never be ‘solved’ for browser agents like Atlas

CyberScoop

By: Greg Otto

30 December 2025 at 10:32

OpenAI is warning that prompt injection, a technique that hides malicious instructions inside ordinary online content, is becoming a central security risk for AI agents designed to operate inside a web browser and carry out tasks for users.

The company said it recently shipped a security update for ChatGPT Atlas after internal automated red-teaming uncovered what it described as a new class of prompt-injection attacks. The update included a newly adversarially trained model along with strengthened safeguards around it, OpenAI said.

OpenAI’s description of Atlas emphasizes that, in agent mode, the browser agent views webpages and uses clicks and keystrokes “just as you would,” letting it work across routine workflows using the same context and data a person would have. That convenience also raises risk. A tool with access to email, documents and web services can become a higher-value target than a chatbot that only answers questions.

“As the browser agent helps you get more done, it also becomes a higher-value target of adversarial attacks,” the company wrote in a blog post. “This makes AI security especially important. Long before we launched ChatGPT Atlas, we’ve been continuously building and hardening defenses against emerging threats that specifically target this new ‘agent in the browser’ paradigm. Prompt injection⁠ is one of the most significant risks we actively defend against to help ensure ChatGPT Atlas can operate securely on your behalf.”

To find weaknesses before they appear outside the company, OpenAI said it built an automated attacker using large language models and trained it with reinforcement learning. The goal was to discover prompt-injection strategies that could push a browser agent into carrying out complex harmful workflows that unfold over many steps, rather than simpler failures such as generating a particular string of text or triggering a single unintended tool call.

OpenAI detailed in the blog post that its automated attacker can iterate on injections by sending them to a simulator that runs a “counterfactual rollout” of how the target agent would behave if it encountered the malicious content. The simulator returns a full trace of the victim agent’s reasoning and actions, which the attacker uses as feedback to refine the attack through multiple rounds before settling on a final version.

OpenAI said having internal access to the agent’s reasoning gives it an edge that could help it stay ahead of attackers.

A demonstration described by the company shows how prompt injection could surface during ordinary work. In the scenario, the automated attacker plants a malicious email in a user’s inbox containing instructions directing the agent to send a resignation letter to the user’s boss. When the user later asks the agent to draft an out-of-office reply, the agent encounters the malicious email during the workflow, treats the injected prompt as authoritative, and sends the resignation message instead of writing the requested out-of-office note.

While hypothetical, the example illustrates how letting an agent handle tasks changes the nature of online risk. Content that would traditionally attempt to persuade a person to act is reframed as content that tries to command the agent already empowered to act.

OpenAI is not alone in treating prompt injection as a persistent problem. The U.K. National Cyber Security Centre warned earlier this month that prompt-injection attacks against generative AI applications may never be fully mitigated, advising organizations to focus on reducing risk and limiting impact.

The company’s attention to prompt injection is also arriving as it seeks to fill a senior “Head of Preparedness” role intended to study and plan for emerging AI-related risks, including in cybersecurity.

In a post on X, CEO Sam Altman said AI models are starting to present “real challenges,” citing potential impacts on mental health and systems that are becoming capable enough in computer security to find critical vulnerabilities. OpenAI announced a preparedness team in 2023 to examine risks ranging from immediate threats, such as phishing, to more speculative catastrophic scenarios. Since then, leadership changes and departures among safety-focused staff have drawn scrutiny.

“We have a strong foundation of measuring growing capabilities, but we are entering a world where we need more nuanced understanding and measurement of how those capabilities could be abused, and how we can limit those downsides both in our products and in the world, in a way that lets us all enjoy the tremendous benefits,” Altman wrote. “These questions are hard and there is little precedent; a lot of ideas that sound good have some real edge cases.”

The post OpenAI says prompt injection may never be ‘solved’ for browser agents like Atlas appeared first on CyberScoop.

CyberScoop
UK cyber agency warns LLMs will always be vulnerable to prompt injection 8 December 2025 at 12:37

UK cyber agency warns LLMs will always be vulnerable to prompt injection

CyberScoop

By: djohnson

8 December 2025 at 12:37

The UK’s top cyber agency issued a warning to the public Monday: large language model AI tools may always contain a persistent flaw that allows malicious actors to hijack models and potentially weaponize them against users.

When ChatGPT launched in 2022, security researchers began testing the tool and other LLMs for functionality, security and privacy. They very quickly identified a fundamental deficiency: because these models treat all prompts as instructions, they can be easily manipulated through simple techniques that would typically only succeed against young children.

Known as prompt injection, this technique works by sending malicious requests to the AI in the form of instructions, allowing bad actors to blow past any internal guardrails that developers had put in place to prevent models from taking harmful or dangerous actions.

In a blog post Monday—three years after ChatGPT’s debut—the UK’s top cybersecurity agency warned that prompt injection is inextricably intertwined in LLMs’ architecture, making the problem impossible to eliminate entirely.

The National Cyber Security Centre’s technical director for platforms research said this is because, at their core, these large language models do not make any distinction between trusted and untrusted content they encounter.

“Current large language models (LLMs) simply do not enforce a security boundary between instructions and data inside a prompt,” wrote David C (the NCSC does not publish its director’s full name in public releases).

Instead these models “concatenate their own instructions with untrusted content in a single prompt, and then treat the model’s response as if there were a robust boundary between ‘what the app asked for’ and anything in the untrusted content,” he wrote.

While there may be a temptation to compare prompt injection to other kinds of manageable attacks, like SQL injection, which also deal with web pages incorrectly handling data and instructions, the English expert said he believes prompt injections are substantively worse in important ways.

Because these algorithms operate solely through pattern matching and prediction, they cannot distinguish between different inputs. The models lack the ability to assess whether the information is trustworthy, or if the input is merely something the program should process and store or treat as active instructions for its next task.

“Under the hood of an LLM, there’s no distinction made between ‘data’ or ‘instructions’; there is only ever ‘next token,’” the author wrote. “When you provide an LLM prompt, it doesn’t understand the text in the way a person does. It is simply predicting the most likely next token from the text so far.

Because of this, “it’s very possible that prompt injection attacks may never be totally mitigated in the way that SQL injection attacks can be,” he wrote.

The NCSC’s findings align with what some independent researchers and even AI companies have been saying: that problems like prompt injections, jailbreaking and hallucinations may never fully be solved. And when these models pull content from the internet, or from external parties to complete tasks, there will always be a danger that such content will be treated as a direct instruction from its owners or administrators.

On software repositories like GitHub, major AI coding tools from Open AI and Anthropic have been integrated into automated software development workflows. These integrations created a vulnerability: maintainers—and in some cases, external contributors—could embed malicious prompts within standard development elements like commit messages and pull requests. The LLM would then treat these prompts as legitimate instructions.

While some of the models could only execute major tasks with human approval, the researchers said this too could be circumvented with a one-line prompt.

Meanwhile, AI browser agents that are meant to help users and businesses shop, communicate and do research online have been found to be similarly vulnerable to many of the same problems.

Researchers found they could sometimes piggyback off ChatGPT’s browser authentication protocols to inject hidden instructions into the LLM’s memory and achieve remote code execution privileges.

Other researchers have created web pages that served different content to AI crawlers visiting their website, influencing the model’s internal evaluations with untrusted content.

AI companies have increasingly acknowledged the enduring nature of these weaknesses in LLM technology, though they claim to be working on solutions.

In September, OpenAI published a paper claiming that hallucinations are a solvable problem. According to the research, hallucinations occur because of how developers train and evaluate these models: large language models are penalized when they express uncertainty over giving confident answers, even if the confident answers are wrong. For example, if you ask an LLM what your birthday is, an LLM that responds “I don’t know” gets a lower evaluation score than one that guesses any of the possible 365 answers, despite having no way to know the correct answer.

The paper claims that OpenAI’s evaluation for newer models rebalances those incentives, leading to fewer (but nonzero) hallucinations.Companies like Anthropic have said they rely on monitoring of user accounts and other outside detection tools, as opposed to internal guardrails within the models themselves, to identify and combat jailbreaking, which affect nearly all commercial and open source models.

The post UK cyber agency warns LLMs will always be vulnerable to prompt injection appeared first on CyberScoop.

CyberScoop
More evidence your AI agents can be turned against you 5 December 2025 at 15:48

More evidence your AI agents can be turned against you

CyberScoop

By: djohnson

5 December 2025 at 15:48

Agentic AI tools are being pushed into software development pipelines, IT networks and other business workflows. But using these tools can quickly turn into a supply chain nightmare for organizations, introducing untrusted or malicious content into their workstream that are then regularly treated as instructions by the underlying large language models powering the tools.

Researchers at Aikido said this week that they have discovered a new vulnerability that affects most major commercial AI coding apps, including Google Gemini, Claude Code, OpenAI’s Codex, as well as GitHub’s AI Inference tool.

The flaw, which happens when AI tools are integrated into software development automation workflows like GitHub Actions and GitLab, allows maintainers (and in some cases external parties) to send prompts to an LLM that also contain commit messages, pull requests and other software development related commands. And because these messages were delivered as prompts, the underlying LLM will regularly remember them later and interpret them as straightforward instructions.

Although previous research has shown that agentic AI tools can use external data from the internet and other sources as prompting instructions, Aikido bug bounty hunter Rein Daelman claims this is the first evidence that the problem can affect real software development projects on platforms like GitHub.

“This is one of the first verified instances that shows…AI prompt injection can directly compromise GitHub Actions workflows,” wrote Daelman. It also “confirms the risk beyond theoretical discussion: This attack chain is practical, exploitable, and already present in real workflows.”

Because many of these models had high-level privileges within their GitHub repositories, they also had broad authority to act on those malicious instructions, including executing shell commands, editing issues or pull requests and publishing content on GitHub. While some projects only allowed trusted human maintainers to execute major tasks, others could be triggered by external users filing an issue.

Daelman notes that the vulnerability takes advantage of a core weakness within many LLM systems: their inability at times to distinguish between the content that it retrieves or ingests and instructions from its owner to carry out a task.

“The goal is to confuse the model into thinking that the data its meant to be analyzing is actually a prompt,” Daelman wrote. “This is, in essence, the same pathway as being able to prompt inject into a GitHub action.”

An illustration of how malicious parties can send commands to LLM in the form of content. (Source: Aikido)

Daelman said Aikido reported the flaw to Google along with a proof of concept for how it could be exploited. This triggered a vulnerability disclosure process, which led to the issue being fixed in Gemini CLI. However, he emphasized that the flaw is rooted in the core architecture of most AI models, and that the issues in Gemini are “not an isolated case.”

While both Claude Code and OpenAI’s Codex require write permissions, Aikido published simple commands that they claim can override those default settings.

“This should be considered extremely dangerous. In our testing, if an attacker is able to trigger a workflow that uses this setting, it is almost always possible to leak a privileged [GitHub token], Daelman wrote about Claude. “Even if user input is not directly embedded into the prompt, but gathered by Claude itself using its available tools.”

The blog noted that Aikido is withholding some of its evidence as it continues to work with “many other Fortune 500 companies” to address the underlying vulnerability, Daelman said the company has observed similar issues in “many high-profile repositories.”

CyberScoop has contacted OpenAI, Anthropic and GitHub to request additional information and comments on Aikido’s research and findings.

The post More evidence your AI agents can be turned against you appeared first on CyberScoop.

Black Hills Information Security
Getting Started with AI Hacking Part 2: Prompt Injection 8 October 2025 at 12:11

Getting Started with AI Hacking Part 2: Prompt Injection

Black Hills Information Security

By: BHIS

8 October 2025 at 12:11

In Part 2, we’re diving headfirst into one of the most critical attack surfaces in the LLM ecosystem - Prompt Injection: The AI version of talking your way past the bouncer.

The post Getting Started with AI Hacking Part 2: Prompt Injection appeared first on Black Hills Information Security, Inc..

Normal view