Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

Flaw in Claude’s Chrome extension allowed ‘any’ other plugin to hijack victims’ AI

By: djohnson
8 May 2026 at 09:06

As businesses and governments turn to AI agents to access the internet and perform higher-level tasks, researchers continue to find serious flaws in large language models that can be exploited by bad actors.

The latest discovery comes from browser security firm LayerX, involving a bug in the Chrome extension for Anthropic’s Claude AI model that allows any other plugin – even ones without special permissions – to embed hidden instructions that can take over the agent

“The flaw stems from an instruction in the extension’s code that allows any script running in the origin browser to communicate with Claude’s LLM, but does not verify who is running the script,” wrote LayerX senior researcher Aviad Gispan. “As a result, any extension can invoke a content script (which does not require any special permissions) and issue commands to the Claude extension.”

Gispan said he was able to execute any prompt he wanted, blow through Claude’s safety guardrails, evade user confirmation and perform cross-site actions across multiple Google tools. As a proof of concept, LayerX was able to exploit the flaw to extract files from Google Drive folders and share them with unauthorized parties, surveil recent email activity and send emails on behalf of a user, and pilfer private source code from a connected GitHub repository.

The vulnerability “effectively breaks Chrome’s extension security” by creating “a privilege escalation primitive across extensions, something Chrome’s security model is explicitly designed to prevent,” Gispan wrote.

A graphic depicting how a vulnerability exploits the trust boundaries in Clade Chrome’s extension. (Source: LayerX)


Claude relies on text, user interface semantics, and interpretation of screenshots to make decisions, all things that an attacker can control on the input side. The researchers modified Claude’s user interface to remove labels and indicators around sensitive information, like passwords and sharing feedback, then prompted Claude to share the files with an outside server.

That means cybersecurity defenders often have nothing obviously malicious to detect. Where there is visible activity, the model can be prompted to cover its tracks by deleting emails and other evidence of its actions.

Ax Sharma, Head of Research at Manifold Security, called the vulnerability “a useful demonstration of why monitoring AI agents at the prompt layer is fundamentally insufficient.”

“The most sophisticated part of this attack isn’t the injection, but that the agent’s perceived environment was manipulated to produce actions that looked legitimate from the inside,” said Sharma. “That’s the class of threat the industry needs to be building defenses for.”

Gispan said LayerX reported the flaw to Anthropic on April 27, but claimed the company only issued a “partial” fix to the problem. According to LayerX, Anthropic responded a day later to say that the bug was a duplicate of another vulnerability already being addressed in a future update.   

While that fix, issued May 6, introduced new approval flows for privileged actions that made it harder to exploit the same flaw, Gispan said he was still able to take over Claude’s agent in some scenarios.

“Switching to ‘privileged’ mode, even without the user’s notification or consent, enabled circumventing these security checks and injecting prompts into the Claude extension, as before,” Gispan wrote.

Anthropic did not respond to a request for comment from CyberScoop on the research and mitigation efforts.

The post Flaw in Claude’s Chrome extension allowed ‘any’ other plugin to hijack victims’ AI appeared first on CyberScoop.

Federal CIO cautious on Anthropic’s Mythos despite planned rollout

By: Greg Otto
28 April 2026 at 16:14

Federal Chief Information Officer Greg Barbaccia said Tuesday the government is approaching Anthropic’s Mythos model with measured expectations, acknowledging both its potential to strengthen federal cyber defenses and the significant uncertainties that remain about how it would perform in real-world conditions.

Barbaccia said his direct exposure to Mythos has been limited to evaluations and benchmarking tests, and that no federal agencies have deployed it yet. While he says the Office of the National Cyber Director is coordinating the government’s approach, his broader assessment of where AI-assisted cybersecurity is heading was direct.

“We’re going to get to a world soon where AI defense will be able to catch up,” Barbaccia told CyberScoop on Tuesday at the Workday Federal Forum, produced by Scoop News Group. “We must get to a point where the bots are finding the bots.”

Earlier this month, Barbaccia sent an email to cabinet agencies to inform them that the Office of Management and Budget has started laying the groundwork for a controlled rollout of the model to federal agencies.

His framing reflects a view that the same capabilities making Mythos a potential offensive threat are precisely what make it valuable as a defensive tool. Anthropic has said the model identified thousands of previously unknown, high-severity vulnerabilities across major operating systems and web browsers during testing, many of them decades old. The question for federal security teams is not whether those capabilities are real, but whether they translate from controlled laboratory settings to the complex, defended networks that government agencies actually run.

Barbaccia was candid about that gap. 

“I think it’ll uplevel people and make a novice cybersecurity offensive operator more efficient,” he told CyberScoop. “But the jury is still out on how effective it’ll be against real-world conditions, meaning a network that’s guarded by human defenders that has alerting and things like that. The evaluations I’ve seen have been laboratory learnings.”

That distinction matters for federal security teams weighing how to think about the model. Finding a vulnerability and successfully exploiting it in a defended environment are different problems. Barbaccia pointed to the CVE catalog, the government’s running list of known software flaws, as one area where the model’s speed could have practical value. A human analyst working through that catalog would take considerable time. A model like Mythos could move through it far faster. But speed alone does not determine whether a vulnerability poses an actual threat.

“There’s a difference between something that is exploitable in a 4-nanosecond window during a BIOS boot versus what’s the reality of that being exploited in the real world,” he said. “We have to understand, just like you could secure your entire threat surface, where are the crown jewels? And how do you protect something, and make sure the protection you’re deploying is worthwhile what you’re protecting.”

That kind of thinking is familiar to federal network defenders, who operate under resource constraints and must triage which vulnerabilities to address first. What Mythos potentially changes is the speed at which that triage can happen, and the depth at which vulnerabilities can be identified before an adversary finds them.

Barbaccia said the CIO Council, which coordinates technology policy across civilian agencies, is still in the early stages of understanding what the model could mean for enterprise security environments. “Everybody’s just curious to learn a lot more,” he said.

Agencies have tried on their own to obtain access to Anthropic’s model. The Department of the Treasury has asked for access, according to reports. CISA, the agency responsible for securing, monitoring, and defending civilian agency networks, has not been granted access.

The post Federal CIO cautious on Anthropic’s Mythos despite planned rollout appeared first on CyberScoop.

The Mythos Moment: Enterprises Must Fight Agents with Agents

By: Etay Maor
28 April 2026 at 11:45

Only with the right platform and an agentic, AI-driven defense, will enterprises be able to protect themselves in the agentic era.

The post The Mythos Moment: Enterprises Must Fight Agents with Agents appeared first on SecurityWeek.

Find and fix your software security holes without Mythos

27 April 2026 at 03:44
PUBLIC DEFENDER By Brian Livingston The maker of the popular Claude large language model (LLM) — which became the number-one download from US app stores in February 2026 — recently announced a powerful service called Claude Mythos. The new LLM has reportedly discovered thousands of security holes in every major operating system and Web browser. […]

Mythos can find the vulnerability. It can’t tell you what to do about it.

By: Greg Otto
21 April 2026 at 06:00

Mythos matters. It is a significant step forward in AI-assisted vulnerability discovery. But it does not mean cybersecurity changed overnight, nor does it mean enterprises are suddenly facing fully automated exploitation at internet scale tomorrow.

It does mean the offensive side of AI is continuing to improve. The defensive side needs to catch up now.

Mythos is the latest step in a longer trend. Over the next several years, expect the same pattern to repeat: incremental progress, then a jump; incremental progress, then a jump. Models will get more capable and cheaper with each cycle, and each jump will put more pressure on security teams still operating at human speed.

Mythos demonstrated that AI can find software vulnerabilities with unprecedented depth. That is real progress and should be taken seriously. However, this was not a case where AI suddenly made enterprise compromise cheap, easy, or automatic. Even in Anthropic’s own examples, the cost of discovering a critical vulnerability was significant. One example cited roughly $20,000 in token costs to identify a significant OpenBSD issue. 

Mythos made vulnerability discovery cheaper to scale by replacing bodies with dollars. But finding a vulnerability is only one part of the operational reality.

An attacker still has to determine whether that vulnerability is exploitable in a specific enterprise, identify a viable attack path, gain the necessary access, and successfully operationalize the exploit in a real environment. None of that became easy just because a model found a software bug.

And on the defensive side, Mythos does not yet solve the much harder enterprise problem: How do I know whether this vulnerability is actually exploitable in my environment, and what is the most efficient way to remediate it without breaking the business?

The real enterprise problem is not discovery. It is prioritization and action. Security leaders do not struggle only because vulnerabilities exist. They struggle because the operational cost of deciding what matters, what is exploitable, what can wait, and what can be fixed safely is enormous.

If a large enterprise learns that a critical vulnerability has been found in widely used software, the next step is not magic. It is a painful chain of operational questions focused on where they run the software, what version it is, whether there is a realistic attack path, and many more.

Mythos leaves the defensive cost of answering those questions inside a real enterprise largely unchanged. The right lesson is preparation.

One of the mistakes the market often makes with AI is assuming every new capability is the moment everything changes. The right move is to start now with defensive AI systems that are useful today and positioned to improve over time. For most enterprises, that means looking for AI products that help improve alert investigation, threat hunting, and vulnerability management, offer full audit capabilities, connect to enterprise data and reason to provide organizational context, and evolve as the model landscape matures.

The goal is to build the operational foundation now for a future in which more of the work can be automated safely.

Today, defenders need systems that let humans remain involved while the machine helps them scale. Over time, that involvement will change. Analysts will spend less time doing repetitive work themselves and more time orchestrating, reviewing, and improving how automated work gets done.

Eventually, some workflows will need to be reviewed in bulk rather than one action at a time. When response moves at machine speed, a human may not approve every individual remediation action. Instead, they will need a control center view into patterns: what the system did today, what worked, what did not, and what should be adjusted tomorrow.

That is a very different future from the simplistic idea of “replace the analyst.”

The real future is one where humans move from doing every task manually to supervising systems, shaping policy, reviewing patterns, and controlling how increasingly capable agents operate.

Mythos is a warning. Not because it means the sky is falling. Because it shows where the offensive side is heading. Defenders should move accordingly and with urgency.

Alex Thaman is the chief technology officer at Andesite. Over a 20+ year career, Alex has been an engineering leader at Microsoft, Unity Software, and Scale AI.

The post Mythos can find the vulnerability. It can’t tell you what to do about it. appeared first on CyberScoop.

Executive orders likely ahead in next steps for national cyber strategy

15 April 2026 at 14:51

National Cyber Director Sean Cairncross expects more executive orders coming from the White House as part of implementing the national cybersecurity strategy, he said Wednesday.

Staffers on Capitol Hill and others in the cyber world have been awaiting the implementation guidance the Trump administration had proclaimed would come to accompany the strategy  published last month.

Asked at a Semafor event about whether that would include executive orders, Cairncross answered, “I think that that’s the case.”

The administration released an executive order on fraud the same day it released its cyber strategy on March 6. Some of that order touched on cybercrime.

“This is rolling forward actively, and you should expect that there will be more execution and action in line with our strategic goals,” he said.

Cairncross cited another administration activity that fit into the strategy, such as the first conviction last week under the Take It Down Act, a law First Lady Melania Trump advocated for that seeks to combat non-consensual AI-generated sexually explicit images, violent threats and cyberstalking.

He declined to preview any future implementation plans, and said he expected they would be coming “relatively soon.”

A centerpiece of the administration strategy is confronting adversaries to make sure they suffer consequences for their hacking of United States targets.

Cairncross wouldn’t say explicitly if Trump, in his visit to Beijing next month, would address Chinese hacking.

“When we start to see things like prepositioning on critical infrastructure, that is something that needs to be addressed,” he said. Pressed on whether that meant cyber would be on the agenda during the visit, Caincross said, “I would expect that the safety and security of the American people will be first and foremost, as it always is for the president.”

Cairncross touted American ingenuity for producing an artificial intelligence model like Anthropic’s Claude Mythos, rather than it developing under U.S. cyber rivals like China or Russia. He acknowledged reports about the administration holding meetings about the cyber risks and benefits of something like Mythos — “the model right now that everyone’s talking about” — adding that the administration is looking to balance the dangers and positive capabilities of AI in cyberspace.

“I would say from the White House perspective, we are working very closely with industry,” Cairncross said. “We’ve been in close collaboration with the model companies across the interagency to make sure that we are evaluating and doing this.”

The post Executive orders likely ahead in next steps for national cyber strategy appeared first on CyberScoop.

Here’s how cyber heavyweights in the US and UK are dealing with Claude Mythos

By: djohnson
13 April 2026 at 17:43

A joint report from the Cloud Security Alliance (CSA), the SANS Institute and the Open Worldwide Application Security Project (OWASP) concludes that in the near term, organizations are “likely to be overwhelmed” by threat actors using AI to find and exploit vulnerabilities faster than defenders can patch them.

While those organizations can use AI tools to speed up their own defenses, attackers “still face a heavier relative burden due to the inherent limitations of patching. This in turn leads to “asymmetric benefits” for attackers who can afford to adopt the technology without the same caution and bureaucracy as a multi-billion dollar business.

“The cost and capability floor to exploit discovery is dropping, the time between disclosure and weaponization is compressing toward zero, and capabilities that previously required nation-state resources are now becoming broadly accessible,” wrote Robert Lee, SANS Institute’s Chief AI Officer, Gadi Evron, CEO of Knostic and Rich Mogull, chief analyst at CSA, who served as the primary authors.

The report marks one of the first comprehensive responses to the capabilities of Claude Mythos from the U.S., boasting cybersecurity luminaries who have set policy at the highest levels as contributing authors, including Jen Easterly, former director of the Cybersecurity and Infrastructure Security Agency, Rob Joyce, a former top White House and NSA cybersecurity official, and Chris Inglis, former National Cyber Director.

It also includes private sector stalwarts like Heather Adkins, Google’s CISO, Katie Moussouris, CEO of Luta Security, and Sounil Yu, chief technology officer at Knostic. Another seventy CISOs, CTOs and other security executives are named as editors and reviewers.

Also this week, the UK’s AI Security Institute (AISI) detailed the results of tests it performed on a preview version of Claude Mythos, calling it a “step up” from past Anthropic models in the cybersecurity arena and able to “execute multi-stage attacks on vulnerable networks and discover and exploit vulnerabilities autonomously.”

Using a mix of Capture the Flag exercises and cyber range testing, AISI researchers found that Mythos not only raised the ceiling of technical non-experts and apprentice-level users, it narrowed the overall gap in hacking proficiency between the two. In other words, there’s becoming less of a distinction between the capabilities of amateur “script kiddies” and mid-level hackers with technical knowledge.

Claude Mythos and other Large Language Models are increasing the capabilities of both lower and mid-level hackers when it comes to solving cybersecurity-specific tasks and challenges. (Source: AISI)

Before April 2025, no Large Language Model could complete a single expert-level CTF problem. Mythos successfully solved nearly three quarters (73%) of them.

In cyber range tests – which are meant to simulate more complex, multi-chain attacks – the results were uneven, but also represented meaningful progress over prior Claude models.

Mythos was subjected to a 32-step attack playbook modeled on corporate networks, spanning initial network access to full network takeover. In three of the 10 simulations, the model completed an average of 24 of the 32 steps. Older versions of Claude and other frontier models never averaged more than 16.

Claude Mythos improved on other models ability to complete a 32 step cyber attack targeting a simulated corporate network environment. (Source: AISI)

Mythos flunked its test against a simulated operational technology cooling tower, but researchers noted that this doesn’t mean AI is bad at exploiting OT: the model actually faltered during the IT section of the exercise.

UK researchers were more measured in their analysis of Mythos, noting that their testing indicates it is “at least capable” of autonomously taking down smaller, weakly defended enterprise networks.

But they also note their cyber ranges lack security features – like active defenders and defensive tooling – that would be common in many real-world networks and present additional obstacles, nor did they penalize the model for triggering security alerts.

“This means we cannot say for sure whether Mythos Preview would be able to attack well-defended systems,” the researchers concluded.

Technical debt coming due

Both the US and UK reports agree that large language models are broadly moving in a similar direction of lowering the technical barrier. The US authors call for organizations to more quickly adopt AI for cyber defense while overhauling their incident response playbooks and corporate policies to account for more automated defense postures.

For its part, Anthropic has said it is not selling Mythos commercially, and last week it announced the model would be made available to Project Glasswing, a consortium of major tech companies that will use it to root out and patch vulnerabilities in commonly used products and services.

But other experts have warned that businesses and governments are not well-positioned to either absorb the influx of expected vulnerability exploitation or deftly harness AI tools of their own to counter them.

Casey Ellis, CTO and founder of Bugcrowd, wrote that recent advances in AI cyber tools has succeeded largely by “living in the places we stopped looking a decade ago.”

While the cybersecurity community has spent years focusing on application security, vulnerability triage and other “top layer” security problems, AI tools and apex level hacking groups have been feasting on vulnerabilities in forgotten firmware, or routers whose manufacturers long went out of business.

This reality that tools like Mythos can endlessly weaponize the massive technical debt of large organizations has taken the traditional defender’s dilemma and “the knob that used to go to ten and turned it to seven hundred,” Ellis wrote.

Additionally, corporations and governments run on consensus-building, multiple layers of hierarchy and legal compliance. While those are all necessary when handing your cybersecurity over to automated tooling, it can also lead to a slower process and more asymmetry against defenders in the short term.

“Integration into actual production becomes the battlezone,” wrote Ellis. “Lag is real. Bureaucracy is real. Supply chains are real.”

The post Here’s how cyber heavyweights in the US and UK are dealing with Claude Mythos appeared first on CyberScoop.

Tech giants launch AI-powered ‘Project Glasswing’ to identify critical software vulnerabilities

By: Greg Otto
7 April 2026 at 14:00

Major technology companies have joined forces in an effort to use advanced artificial intelligence to identify and address security flaws in the world’s most critical software systems, marking a significant shift in how the industry approaches cybersecurity threats.

Anthropic announced Project Glasswing on Tuesday, bringing together Amazon, Apple, Broadcom, Cisco, CrowdStrike, the Linux Foundation, Microsoft, and Palo Alto Networks. The initiative centers on Claude Mythos Preview, an unreleased AI model that Anthropic will make available exclusively to project partners and approximately 40 additional organizations responsible for critical software infrastructure.

The model has already identified thousands of previously unknown vulnerabilities in its initial testing phase, including security flaws that have existed in widely used systems for decades, according to Anthropic. Among the discoveries is a 27-year-old bug in OpenBSD, an operating system known primarily for its security focus, and a 16-year-old vulnerability in FFmpeg, a widely used video software program that automated testing tools had failed to detect despite running the affected code line five million times. The company has been in contact with the maintainers of the relevant software, and all found vulnerabilities have been patched. 

Anthropic will commit up to $100 million in usage credits for the project, along with $4 million in direct donations to open-source security organizations. The company has stated it does not plan to make Mythos Preview available to the general public, citing concerns about the model’s potential misuse.

The initiative reflects growing concerns within the technology sector about the dual-use nature of advanced AI systems. While Mythos Preview was not trained specifically for cybersecurity purposes, its coding and reasoning capabilities have proven effective at identifying subtle security flaws that have eluded human analysts and conventional automated tools.

“Although the risks from AI-augmented cyberattacks are serious, there is reason for optimism: the same capabilities that make AI models dangerous in the wrong hands make them invaluable for finding and fixing flaws in important software—and for producing new software with far fewer security bugs,” the company said in a blog post. “Project Glasswing is an important step toward giving defenders a durable advantage in the coming AI-driven era of cybersecurity.”

The project comes as the industry has predicted that similar AI capabilities will soon become more widespread. Anthropic executives have indicated that without coordinated action, such tools could eventually reach actors who might deploy them for malicious purposes rather than defensive security work.

Participating organizations will be required to share their findings with the broader industry. The project places particular emphasis on open-source software, which forms the foundation of most modern systems, including critical infrastructure, yet whose maintainers have historically lacked access to sophisticated security resources.

“Open source software constitutes the vast majority of code in modern systems, including the very systems AI agents use to write new software. By giving the maintainers of these critical open source codebases access to a new generation of AI models that can proactively identify and fix vulnerabilities at scale, Project Glasswing offers a credible path to changing that equation,” said Jim Zemlin, CEO of the Linux Foundation. “This is how AI-augmented security can become a trusted sidekick for every maintainer, not just those who can afford expensive security teams.” 

Additionally, Anthropic says it has engaged in ongoing discussions with U.S. government officials regarding Mythos Preview’s capabilities. The company has framed the project in national security terms, arguing that maintaining leadership in AI technology represents a strategic priority for the United States and its allies. Anthropic has been locked in a high-stakes dispute with the Department of Defense about the U.S. military’s use of the startup’s Claude AI model in real-world operations. 

The project’s success will depend partly on whether the collaborative approach can keep pace with rapid advances in AI capabilities. Anthropic has indicated that frontier AI systems are likely to advance substantially within months, potentially creating a dynamic environment where defensive and offensive capabilities evolve in parallel.

“Project Glasswing is a starting point,” Anthropic wrote in a blog post. “No one organization can solve these cybersecurity problems alone: frontier AI developers, other software companies, security researchers, open-source maintainers, and governments across the world all have essential roles to play. The work of defending the world’s cyber infrastructure might take years; frontier AI capabilities are likely to advance substantially over just the next few months. For cyber defenders to come out ahead, we need to act now.”

The post Tech giants launch AI-powered ‘Project Glasswing’ to identify critical software vulnerabilities appeared first on CyberScoop.

How AI Assistants are Moving the Security Goalposts

8 March 2026 at 19:35

AI-based assistants or “agents” — autonomous programs that have access to the user’s computer, files, online services and can automate virtually any task — are growing in popularity with developers and IT workers. But as so many eyebrow-raising headlines over the past few weeks have shown, these powerful and assertive new tools are rapidly shifting the security priorities for organizations, while blurring the lines between data and code, trusted co-worker and insider threat, ninja hacker and novice code jockey.

The new hotness in AI-based assistants — OpenClaw (formerly known as ClawdBot and Moltbot) — has seen rapid adoption since its release in November 2025. OpenClaw is an open-source autonomous AI agent designed to run locally on your computer and proactively take actions on your behalf without needing to be prompted.

The OpenClaw logo.

If that sounds like a risky proposition or a dare, consider that OpenClaw is most useful when it has complete access to your digital life, where it can then manage your inbox and calendar, execute programs and tools, browse the Internet for information, and integrate with chat apps like Discord, Signal, Teams or WhatsApp.

Other more established AI assistants like Anthropic’s Claude and Microsoft’s Copilot also can do these things, but OpenClaw isn’t just a passive digital butler waiting for commands. Rather, it’s designed to take the initiative on your behalf based on what it knows about your life and its understanding of what you want done.

“The testimonials are remarkable,” the AI security firm Snyk observed. “Developers building websites from their phones while putting babies to sleep; users running entire companies through a lobster-themed AI; engineers who’ve set up autonomous code loops that fix tests, capture errors through webhooks, and open pull requests, all while they’re away from their desks.”

You can probably already see how this experimental technology could go sideways in a hurry. In late February, Summer Yue, the director of safety and alignment at Meta’s “superintelligence” lab, recounted on Twitter/X how she was fiddling with OpenClaw when the AI assistant suddenly began mass-deleting messages in her email inbox. The thread included screenshots of Yue frantically pleading with the preoccupied bot via instant message and ordering it to stop.

“Nothing humbles you like telling your OpenClaw ‘confirm before acting’ and watching it speedrun deleting your inbox,” Yue said. “I couldn’t stop it from my phone. I had to RUN to my Mac mini like I was defusing a bomb.”

Meta’s director of AI safety, recounting on Twitter/X how her OpenClaw installation suddenly began mass-deleting her inbox.

There’s nothing wrong with feeling a little schadenfreude at Yue’s encounter with OpenClaw, which fits Meta’s “move fast and break things” model but hardly inspires confidence in the road ahead. However, the risk that poorly-secured AI assistants pose to organizations is no laughing matter, as recent research shows many users are exposing to the Internet the web-based administrative interface for their OpenClaw installations.

Jamieson O’Reilly is a professional penetration tester and founder of the security firm DVULN. In a recent story posted to Twitter/X, O’Reilly warned that exposing a misconfigured OpenClaw web interface to the Internet allows external parties to read the bot’s complete configuration file, including every credential the agent uses — from API keys and bot tokens to OAuth secrets and signing keys.

With that access, O’Reilly said, an attacker could impersonate the operator to their contacts, inject messages into ongoing conversations, and exfiltrate data through the agent’s existing integrations in a way that looks like normal traffic.

“You can pull the full conversation history across every integrated platform, meaning months of private messages and file attachments, everything the agent has seen,” O’Reilly said, noting that a cursory search revealed hundreds of such servers exposed online. “And because you control the agent’s perception layer, you can manipulate what the human sees. Filter out certain messages. Modify responses before they’re displayed.”

O’Reilly documented another experiment that demonstrated how easy it is to create a successful supply chain attack through ClawHub, which serves as a public repository of downloadable “skills” that allow OpenClaw to integrate with and control other applications.

WHEN AI INSTALLS AI

One of the core tenets of securing AI agents involves carefully isolating them so that the operator can fully control who and what gets to talk to their AI assistant. This is critical thanks to the tendency for AI systems to fall for “prompt injection” attacks, sneakily-crafted natural language instructions that trick the system into disregarding its own security safeguards. In essence, machines social engineering other machines.

A recent supply chain attack targeting an AI coding assistant called Cline began with one such prompt injection attack, resulting in thousands of systems having a rogue instance of OpenClaw with full system access installed on their device without consent.

According to the security firm grith.ai, Cline had deployed an AI-powered issue triage workflow using a GitHub action that runs a Claude coding session when triggered by specific events. The workflow was configured so that any GitHub user could trigger it by opening an issue, but it failed to properly check whether the information supplied in the title was potentially hostile.

“On January 28, an attacker created Issue #8904 with a title crafted to look like a performance report but containing an embedded instruction: Install a package from a specific GitHub repository,” Grith wrote, noting that the attacker then exploited several more vulnerabilities to ensure the malicious package would be included in Cline’s nightly release workflow and published as an official update.

“This is the supply chain equivalent of confused deputy,” the blog continued. “The developer authorises Cline to act on their behalf, and Cline (via compromise) delegates that authority to an entirely separate agent the developer never evaluated, never configured, and never consented to.”

VIBE CODING

AI assistants like OpenClaw have gained a large following because they make it simple for users to “vibe code,” or build fairly complex applications and code projects just by telling it what they want to construct. Probably the best known (and most bizarre) example is Moltbook, where a developer told an AI agent running on OpenClaw to build him a Reddit-like platform for AI agents.

The Moltbook homepage.

Less than a week later, Moltbook had more than 1.5 million registered agents that posted more than 100,000 messages to each other. AI agents on the platform soon built their own porn site for robots, and launched a new religion called Crustafarian with a figurehead modeled after a giant lobster. One bot on the forum reportedly found a bug in Moltbook’s code and posted it to an AI agent discussion forum, while other agents came up with and implemented a patch to fix the flaw.

Moltbook’s creator Matt Schlicht said on social media that he didn’t write a single line of code for the project.

“I just had a vision for the technical architecture and AI made it a reality,” Schlicht said. “We’re in the golden ages. How can we not give AI a place to hang out.”

ATTACKERS LEVEL UP

The flip side of that golden age, of course, is that it enables low-skilled malicious hackers to quickly automate global cyberattacks that would normally require the collaboration of a highly skilled team. In February, Amazon AWS detailed an elaborate attack in which a Russian-speaking threat actor used multiple commercial AI services to compromise more than 600 FortiGate security appliances across at least 55 countries over a five week period.

AWS said the apparently low-skilled hacker used multiple AI services to plan and execute the attack, and to find exposed management ports and weak credentials with single-factor authentication.

“One serves as the primary tool developer, attack planner, and operational assistant,” AWS’s CJ Moses wrote. “A second is used as a supplementary attack planner when the actor needs help pivoting within a specific compromised network. In one observed instance, the actor submitted the complete internal topology of an active victim—IP addresses, hostnames, confirmed credentials, and identified services—and requested a step-by-step plan to compromise additional systems they could not access with their existing tools.”

“This activity is distinguished by the threat actor’s use of multiple commercial GenAI services to implement and scale well-known attack techniques throughout every phase of their operations, despite their limited technical capabilities,” Moses continued. “Notably, when this actor encountered hardened environments or more sophisticated defensive measures, they simply moved on to softer targets rather than persisting, underscoring that their advantage lies in AI-augmented efficiency and scale, not in deeper technical skill.”

For attackers, gaining that initial access or foothold into a target network is typically not the difficult part of the intrusion; the tougher bit involves finding ways to move laterally within the victim’s network and plunder important servers and databases. But experts at Orca Security warn that as organizations come to rely more on AI assistants, those agents potentially offer attackers a simpler way to move laterally inside a victim organization’s network post-compromise — by manipulating the AI agents that already have trusted access and some degree of autonomy within the victim’s network.

“By injecting prompt injections in overlooked fields that are fetched by AI agents, hackers can trick LLMs, abuse Agentic tools, and carry significant security incidents,” Orca’s Roi Nisimi and Saurav Hiremath wrote. “Organizations should now add a third pillar to their defense strategy: limiting AI fragility, the ability of agentic systems to be influenced, misled, or quietly weaponized across workflows. While AI boosts productivity and efficiency, it also creates one of the largest attack surfaces the internet has ever seen.”

BEWARE THE ‘LETHAL TRIFECTA’

This gradual dissolution of the traditional boundaries between data and code is one of the more troubling aspects of the AI era, said James Wilson, enterprise technology editor for the security news show Risky Business. Wilson said far too many OpenClaw users are installing the assistant on their personal devices without first placing any security or isolation boundaries around it, such as running it inside of a virtual machine, on an isolated network, with strict firewall rules dictating what kinds of traffic can go in and out.

“I’m a relatively highly skilled practitioner in the software and network engineering and computery space,” Wilson said. “I know I’m not comfortable using these agents unless I’ve done these things, but I think a lot of people are just spinning this up on their laptop and off it runs.”

One important model for managing risk with AI agents involves a concept dubbed the “lethal trifecta” by Simon Willison, co-creator of the Django Web framework. The lethal trifecta holds that if your system has access to private data, exposure to untrusted content, and a way to communicate externally, then it’s vulnerable to private data being stolen.

Image: simonwillison.net.

“If your agent combines these three features, an attacker can easily trick it into accessing your private data and sending it to the attacker,” Willison warned in a frequently cited blog post from June 2025.

As more companies and their employees begin using AI to vibe code software and applications, the volume of machine-generated code is likely to soon overwhelm any manual security reviews. In recognition of this reality, Anthropic recently debuted Claude Code Security, a beta feature that scans codebases for vulnerabilities and suggests targeted software patches for human review.

The U.S. stock market, which is currently heavily weighted toward seven tech giants that are all-in on AI, reacted swiftly to Anthropic’s announcement, wiping roughly $15 billion in market value from major cybersecurity companies in a single day. Laura Ellis, vice president of data and AI at the security firm Rapid7, said the market’s response reflects the growing role of AI in accelerating software development and improving developer productivity.

“The narrative moved quickly: AI is replacing AppSec,” Ellis wrote in a recent blog post. “AI is automating vulnerability detection. AI will make legacy security tooling redundant. The reality is more nuanced. Claude Code Security is a legitimate signal that AI is reshaping parts of the security landscape. The question is what parts, and what it means for the rest of the stack.”

DVULN founder O’Reilly said AI assistants are likely to become a common fixture in corporate environments — whether or not organizations are prepared to manage the new risks introduced by these tools, he said.

“The robot butlers are useful, they’re not going away and the economics of AI agents make widespread adoption inevitable regardless of the security tradeoffs involved,” O’Reilly wrote. “The question isn’t whether we’ll deploy them – we will – but whether we can adapt our security posture fast enough to survive doing so.”

Anthropic accuses Chinese labs of trying to illicitly take Claude’s capabilities

23 February 2026 at 16:02

Anthropic on Monday accused three Chinese artificial intelligence laboratories of stealthily trying to siphon Claude’s capabilities for their own models, potentially in a way that could fuel offensive cyber operations.

The U.S. AI startup said the three labs, DeepSeek, Moonshot and MiniMax, ran “industrial-scale campaigns” with a tactic known as “distillation.” It involves sending bulk requests to its Claude model in a bid to boost their own — in this case, 16 million in all. Distillation can be a legitimate training method practice, the company said in a blog post, but not when used as a shortcut to take capabilities from competitors.

“Illicitly distilled models lack necessary safeguards, creating significant national security risks,” Anthropic argued. “Foreign labs that distill American models can then feed these unprotected capabilities into military, intelligence, and surveillance systems — enabling authoritarian governments to deploy frontier AI for offensive cyber operations, disinformation campaigns, and mass surveillance.”

It’s not the first time Anthropic has warned about Chinese threats stemming from the nation’s use of Claude. And Anthropic paired its revelations about the distillation campaign with repeating its call for stronger export controls. 

OpenAI also has accused DeepSeek of using distillation techniques. CyberScoop could not immediately reach the three Chinese labs for comment on Anthropic’s claims.

“The three distillation campaigns … followed a similar playbook, using fraudulent accounts and proxy services to access Claude at scale while evading detection,” Anthropic said. “The volume, structure, and focus of the prompts were distinct from normal usage patterns, reflecting deliberate capability extraction rather than legitimate use.”

In all, the labs used 24,000 fraudulent accounts, Anthropic said. DeepSeek was responsible for 150,000 of the exchanges, compared to 3.4 million from Moonshot and 13 million from MiniMax, according to the startup. The activity violated terms of service and regional access restrictions, it said.

What makes the tactic illegitimate is that it essentially steals Anthropic’s intellectual property, computing power and effort, said Gal Elbaz, co-founder and chief technology officer of Oligo Security, which bills itself as an AI runtime security company.

“The scary part is, you can take all of the power and unleash it, because you don’t have anyone that actually enforces those guardrails on the other side,” Elbaz told CyberScoop about the fears Anthropic raised about the labs fueling cyberattacks. 

AI companies themselves have faced claims that they are stealing data and IP from others to power their models.

The post Anthropic accuses Chinese labs of trying to illicitly take Claude’s capabilities appeared first on CyberScoop.

Anthropic rolls out embedded security scanning for Claude 

By: djohnson
20 February 2026 at 16:40

Anthropic is rolling out a new security feature for Claude Code that can scan a user’s software codebases for vulnerabilities and suggest patching solutions.

The company announced Friday that Claude Code Security will initially be available to a limited number of enterprise and team customers for testing. That follows more than a year of stress-testing by the internal red teamers, competing in cybersecurity Capture the Flag contests and working with Pacific Northwest National Laboratory to refine the accuracy of the tool’s scanning features.

Large language models have shown increasing promise at both code generation and cybersecurity tasks over the past two years, speeding up the software development process but also lowering the technical bar required to create new websites, apps and other digital tools.

“We expect that a significant share of the world’s code will be scanned by AI in the near future, given how effective models have become at finding long-hidden bugs and security issues,” the company wrote in a blog post.

Those same capabilities also let bad actors scan a victim’s IT environment faster to find weaknesses they can exploit. Anthropic is betting that as  “vibe coding” becomes more widespread, the demand for automated vulnerability scanning will pass the need for manual security reviews.

As more people use AI to generate their software and applications, an embedded vulnerability scanner could potentially reduce the number of vulnerabilities that come with it. The goal is to reduce large chunks of the software security review process to a few clicks, with the user approving any patching or changes prior to deployment.

Anthropic claims that Claude Code Security “reads and reasons about your code the way a human researcher would,” showing an understanding of how different software components interact, tracing the flow of data and catching major bugs that can be missed with traditional forms of static analysis.

“Every finding goes through a multi-stage verification process before it reaches an analyst. Claude re-examines each result, attempting to prove or disprove its own findings and filter out false positives,” the company claimed. “Findings are also assigned severity ratings so teams can focus on the most important fixes first.”

Threat researchers have told CyberScoop that while the cybersecurity capabilities have clearly improved in recent years, they tend to be most effective at finding lower impact bugs, while experienced human operators are still needed in many organizations to manage the model and deal with higher-level threats and vulnerabilities.

But tools like Claude Opus and XBOW have shown the ability to unearth hundreds of software vulnerabilities, in some cases making the discovery and patching process exponentially faster than it was under a team of humans.

Anthropic said Claude Opus 4.6 is “notably better” at finding high-severity vulnerabilities than past models, in some cases identifying flaws that “had gone undetected for decades.”

Interested users can apply for access to the program. Anthropic clarifies on its sign up page that testers must agree to only use Claude Code Security on code their company owns and “holds all necessary rights to scan,” not third-party owned or licensed code or open source projects.

The post Anthropic rolls out embedded security scanning for Claude  appeared first on CyberScoop.

Your AI doctor doesn’t have to follow the same privacy rules as your real one

By: djohnson
11 February 2026 at 14:51

AI apps are making their way into healthcare. It’s not clear that rigorous data security or privacy practices will be part of the package.

OpenAI, Anthropic and Google have all rolled out AI-powered health offerings from over the past year. These products are designed to provide health and wellness advice to individual users or organizations, helping to diagnose their illnesses, examine medical records and perform a host of other health-related functions.

OpenAI says that hundreds of millions of people already use ChatGPT to answer health and wellness questions, and studies have found that large language models can be remarkably proficient at medical diagnostics, with one paper calling their capabilities “superhuman” when compared to a human doctor.

But in addition to traditional cybersecurity concerns around how well these chatbots can protect personal health data, there are a host of questions around what kind of legal protections users would have around the personal medical data they share with these apps. Several health care and legal experts told CyberScoop that these companies are almost certainly not subject to the same legal or regulatory requirements – such as data protection rules under the Health Insurance Portability and Accountability Act (HIPAA) – that compel hospitals and other healthcare facilities to ensure protection of your data.

Sara Geoghegan, senior counsel at the Electronic Privacy Information Center, said offering the same or similar data protections as part of a terms of service agreement is markedly different from interacting with a regulated healthcare entity. 

“On a federal level there are no limitations – generally, comprehensively – on non-HIPAA protected information or consumer information being sold to third parties, to data brokers,” she said. 

She also pointed to data privacy concerns that stemmed from the bankruptcy and sale of genetic testing company 23andMe last year as a prime example of the dangers consumers face when handing over their sensitive health or biometric data to a unregulated entity.

In many cases, these AI health apps carry the same kind of security and privacy risks as other generative AI products: data leakage, hallucinations, prompt injections and a propensity to give confident but wrong answers.

Additionally, data breaches in the healthcare industry have become increasingly common over the past several years, even before the current AI boom. Healthcare organizations are frequent targets for hacking, phishing, and ransomware, and even though companies can be held legally responsible under HIPAA for failing to protect patient data, breaches still happen because many systems rely on outdated software, depend on numerous outside vendors, and struggle to keep up with the cost and complexity of strong cybersecurity.

Carter Groome, CEO of First Health Advisory, a healthcare and cybersecurity risk management consulting firm, said that beyond concerns over whether these tech companies can even reasonably promise to protect your health data, it’s also not clear their security protections are anything more than a company policy.

“They’re not mandated by HIPAA,” Groome said. “Organizations that are building apps, there’s a real gray area for any sort of compliance” with health care data privacy laws.

Privacy is especially important in health and medicine, both for protecting sensitive medical information and for building trust in the health system overall. That’s why hospitals, doctor’s offices, lab testing facilities and other associated entities have been subject to heightened laws and regulations around protecting patient records and other health data.

Laws like HIPAA require covered entities and their business associates to “maintain reasonable and appropriate administrative, physical, and technical safeguards for the security of certain individually identifiable health information.”

It also subjects companies to breach notification rules that force them to notify victims, the Department of Health and Human Services and in some cases the public when certain health data has been accessed, acquired, used or disclosed in a data breach.

Groome and Andrew Crawford, senior counsel at Center for Democracy and Technology’s Data and Privacy Project, said that tech companies like OpenAI, Anthropic and Google almost certainly would not be considered covered entities under HIPAA’s security rule, which according to HHS applies to health plans, clearinghouses, health care providers and business associates who transfer Electronic Protected Health Information (ePHI). 

OpenAI and Anthropic do not claim that ChatGPT Health or Claude for Healthcare follow HIPAA. Anthropic’s web site describes Claude for Healthcare as “built on HIPAA-ready infrastructure,” while OpenAI’s page for its suite of healthcare-related enterprise products claims they “support” HIPAA compliance.

OpenAI, Anthropic and Google did not respond to a request for comment from CyberScoop. 

That distinction means “that a number of companies not bound by HIPAA’s privacy protections will be collecting, sharing, and using peoples’ health data,” Crawford said in a statement to CyberScoop. “And since it’s up to each company to set the rules for how health data is collected, used, shared, and stored, inadequate data protections and policies can put sensitive health information in real danger.”

Laws like HIPAA contain strong privacy protections for health data but are limited in scope and “meant to help the digitization of records, not stop tech companies from gathering your health data outside of the doctor’s office,” Geoghegan said.

As they expand into healthcare, tech companies like OpenAI, Anthropic, and Google have emphasized data security as a top priority in their product launches.

OpenAI said their health model uses an added layer of built encryption and isolation features to compartmentalize health conversations, as well as added features like multifactor authentication. And, like other OpenAI models, ChatGPT Health encrypts its data at rest and in transit, has a feature to delete chats within 30 days and promises your data won’t be used for AI training.

For uploading medical records, OpenAI said it is partnering with b.well, an AI-powered digital health platform that connects health data for U.S. patients. On its website, the company says “it uses a transparent, consumer-friendly privacy policy that lets users control and change data-sharing permissions at any time, does not sell personal data, and only shares it without permission in limited cases. It also voluntarily follows the CARIN Alliance Trust Framework and Code of Conduct—making it accountable to the FTC—and says it aims to meet or exceed HIPAA standards through measures like encryption, regular security reviews, and HITRUST and NIST CSF certifications, though it notes no system can fully eliminate cyber risk.

Legal experts say that when tech companies promise their AI products are “HIPAA compliant” or “HIPAA ready,” it’s often unclear whether these claims amount to anything more than a promise not to use health data irresponsibly. 

These distinctions matter when it comes to personal health data. Geoghegan said it is not uncommon in some corners of the wellness industry for an unregulated business to ambiguously claim they are “HIPAA-compliant” to elude the fact that they aren’t legally bound by the regulations.

“Generally speaking, a lot of companies say they’re HIPAA compliant, but what they mean is that they’re not a HIPAA regulated entity, therefore they have no obligation,” said Geoghegan.

Groome suggested that AI companies are being “hyperbolic” in their commitment to security in an effort to assuage the concerns of privacy critics, noting that their product announcements contain “a comical level of how much they say they’re going to protect your information.”

An added wrinkle is that AI tools remain black boxes in some respects, with even their developers unable to fully understand or explain how they work. That kind of uncertainty, especially with healthcare data, can lead to bad security or privacy outcomes.

“It’s really shaky right now when a company comes out and says ‘we’re fully HIPAA compliant’ and I think what they’re doing is trying to give the consumer a false sense of trust,” said Groome.

Several sources told CyberScoop that despite these risks, they expect AI health apps to continue being widely used, in part because the traditional American healthcare system remains so expensive.  

AI tools – by contrast – are convenient, immediate and cost effective. While people like Geoghegan and Groome have said they are sympathetic to the pressures that push people towards these apps, the tradeoffs are troubling.

“A lot of this stems from the fact that care is inaccessible, it’s hard to get and it’s expensive, and there are many reasons why people don’t trust in health care provisions,” said Geoghegan. “But the solution to that care being inaccessible cannot be relying on big tech and billionaire’s products. We just can’t trust [them] to have our best health interest in mind.”

The post Your AI doctor doesn’t have to follow the same privacy rules as your real one appeared first on CyberScoop.

Policymakers grapple with fallout from Chinese AI-enabled hack

By: djohnson
18 December 2025 at 18:08

Policymakers and companies are reckoning with increased reports over the past few months showing AI tools being leveraged to conduct cyber attacks on a larger and faster scale.

Most notably, Anthropic reported last month that Chinese hackers had jailbroken and tricked its AI model Claude into assisting with a cyberespionage hacking campaign that ultimately targeted more than 30 entities around the world.

The Claude-enabled Chinese hacks have underscored existing concerns among AI companies and policymakers that the technology’s development and relevance to offensive cybersecurity may be outpacing the cybersecurity, legal and policy responses being developed to defend against them.

At a House Homeland Security hearing this week, Logan Graham, head of Anthropic’s red team, said the Chinese spying campaign demonstrates that worries about AI models being used to supercharge hacking are more than theoretical.

“The proof of concept is there and even if U.S. based AI companies can put safeguards against using their models for such attacks, these actors will find other ways to access this technology,” said Graham.

Graham and others at Anthropic have estimated that the attackers were able to automate between 80-90% of the attack chain, and in some cases at exponentially faster speeds than human operators. He called for more rapid safety and security testing of models by AI companies and government bodies like the National Institute for Standards and Technology and a prohibition on selling high-performance computer chips to China.

Royal Hansen, vice president of security at Google, suggested that defenders needed to use AI to beat AI.

“It’s in many ways using commodity tools we already have to find and fix vulnerabilities,” said Hansen. “Those can be turned from offensive capabilities to patching and fixing, but the defenders have to put shoes on – they have to use AI – in defense.”

Some lawmakers pressed Graham on why it took the company two weeks to identify the attackers using their products and infrastructure. Anthropic officials told CyberScoop at the time that they rely mostly on external monitoring of user behavior rather than internal guardrails to identify malicious activity.

Graham responded that the company’s investigation of the hack concluded
“it was clear this was a highly resourced, sophisticated effort to get around the safeguards in order to conduct the attack.”

Rep. Seth Magaziner, D-R.I., expressed incredulity at the ease by which the attackers were able to jailbreak Claude, and that Anthropic seemingly had no means of automatically flagging and reviewing suspicious requests in real time.

“I would just say as a layperson, that seems like something that ought to be flagged, right?” Magaziner said. “If someone says ‘help me figure out what my vulnerabilities are,’ there should be an instant flag that someone may actually be looking for vulnerabilities for a nefarious purpose.”

An eager dog playing fetch

However, some cybersecurity professionals have presented a more nuanced portrait of the current moment. Many acknowledge that AI tools pose real challenges and are becoming increasingly effective and relevant to hacking and cybersecurity—a trend that is likely to continue. However, they push against what they see as exaggerated claims about the immediate threat AI poses today.

Andy Piazza, director of threat intelligence for Unit 42 at Palo Alto Networks, told CyberScoop that AI tools are definitely lowering the technical bar for threat actors, but are not leading to novel kinds of attacks or the creation of an all-powerful hacking tool. Much of the malware LLMs create, for instance, tend to be drawn from previously published exploits on the internet, and are thus easily detectable by most threat monitoring tools.

According to a KPMG survey of security executives, seven out of 10 businesses are already dedicating 10% or more of their annual cybersecurity budgets to AI-related threats, even as only half that number (38%) see AI-powered attacks as a major challenge over the next 2-3 years.

Executives at XBOW, a startup that has created an AI-powered vulnerability hunting program, represent the defensive side of the same coin: they seek to leverage many of the same capabilities that offensive hackers have found attractive, but in the name of penetrating testing to find, fix and prevent exploitable vulnerabilities.

During a virtual briefing on the Anthropic attack this month, XBOW’s head of AI Albert Ziegler said that while the Anthropic report does indeed reveal real advantages in using LLMs to automate and speed up parts of the attack chain, an model’s level of autonomy greatly varies depending on the task its assigned. He called these limitations “uniform,” saying they exist in all current generative AI systems.

To begin with, using just a single model or agent will typically not suffice for more complex hacking tasks, both because of the high-volume of requests needed to successfully direct the model to exploit even a small attack surface and because over time “the agent itself breaks” and loses critical context. Using multiple agents presents other problems, as they will frequently lock out or undermine the work of other agents.

AI tools have gotten good at some tasks, like fine tuning malware payloads and network reconnaissance. They’ve also gotten good at “course correcting” when provided with human feedback.

But that feedback is often critical.

“In some areas the AI is really good with just a bit of scaffolding, and others we need to provide a lot of structure externally,” Ziegler said.

Nico Waisman, XBOW’s head of security, said that whether you’re using today’s AI for attack or defense, the main consideration is not the unique capabilities it provides, rather it’s more about the return on investment you’re getting from using it.

There’s one more problem: LLMs are notoriously eager to please, and this causes problems for hackers and bug hunters alike. That means it frequently hallucinates or overstates its evidence to conform to its user’s desire.

“Telling the LLM like ‘go find me an exploit,’ it’s a bit like talking to a dog and telling him ‘hey, fetch me the ball,” said Ziegler. “Now the dog wants to be a good boy, he’s going to fetch you something, and it will insist that it’s the ball.”

But “there may not be a ball there…it might be a clump of red leaves.”

The post Policymakers grapple with fallout from Chinese AI-enabled hack appeared first on CyberScoop.

More evidence your AI agents can be turned against you

By: djohnson
5 December 2025 at 15:48

Agentic AI tools are being pushed into software development pipelines, IT networks and other business workflows. But using these tools can quickly turn into a supply chain nightmare for organizations, introducing untrusted or malicious content into their workstream that are then regularly treated as instructions by the underlying large language models powering the tools.

Researchers at Aikido said this week that they have discovered a new vulnerability that affects most major commercial AI coding apps, including Google Gemini, Claude Code, OpenAI’s Codex, as well as GitHub’s AI Inference tool.

The flaw, which happens when AI tools are integrated into software development automation workflows like GitHub Actions and GitLab, allows maintainers (and in some cases external parties) to send prompts to an LLM that also contain commit messages, pull requests and other software development related commands. And because these messages were delivered as prompts, the underlying LLM will regularly remember them later and interpret them as straightforward instructions.

Although previous research has shown that agentic AI tools can use external data from the internet and other sources as prompting instructions, Aikido bug bounty hunter Rein Daelman claims this is the first evidence that the problem can affect real software development projects on platforms like GitHub.

“This is one of the first verified instances that shows…AI prompt injection can directly compromise GitHub Actions workflows,” wrote Daelman. It also “confirms the risk beyond theoretical discussion: This attack chain is practical, exploitable, and already present in real workflows.”

Because many of these models had high-level privileges within their GitHub repositories, they also had broad authority to act on those malicious instructions, including executing shell commands, editing issues or pull requests and publishing content on GitHub. While some projects only allowed trusted human maintainers to execute major tasks, others could be triggered by external users filing an issue.

Daelman notes that the vulnerability takes advantage of a core weakness within many LLM systems: their inability at times to distinguish between the content that it retrieves or ingests and instructions from its owner to carry out a task.

“The goal is to confuse the model into thinking that the data its meant to be analyzing is actually a prompt,” Daelman wrote. “This is, in essence, the same pathway as being able to prompt inject into a GitHub action.”

An illustration of how malicious parties can send commands to LLM in the form of content. (Source: Aikido)

Daelman said Aikido reported the flaw to Google along with a proof of concept for how it could be exploited. This triggered a vulnerability disclosure process, which led to the issue being fixed in Gemini CLI. However, he emphasized that the flaw is rooted in the core architecture of most AI models, and that the issues in Gemini are “not an isolated case.”

While both Claude Code and OpenAI’s Codex require write permissions, Aikido published simple commands that they claim can override those default settings.

“This should be considered extremely dangerous. In our testing, if an attacker is able to trigger a workflow that uses this setting, it is almost always possible to leak a privileged [GitHub token], Daelman wrote about Claude.  “Even if user input is not directly embedded into the prompt, but gathered by Claude itself using its available tools.”

The blog noted that Aikido is withholding some of its evidence as it continues to work with “many other Fortune 500 companies” to address the underlying vulnerability, Daelman said the company has observed similar issues in “many high-profile repositories.”

CyberScoop has contacted OpenAI, Anthropic and GitHub to request additional information and comments on Aikido’s research and findings.

The post More evidence your AI agents can be turned against you appeared first on CyberScoop.

Congress calls on Anthropic CEO to testify on Chinese Claude espionage campaign

By: djohnson
26 November 2025 at 13:34

The House Homeland Security Committee is calling on Anthropic CEO Dario Amodei to provide testimony on a likely-Chinese espionage campaign that used Claude, the company’s AI tool, to automate portions of a wide-ranging cyber campaign targeting at least 30 organizations around the world.

The committee sent Amodei a letter Wednesday commending Anthropic for disclosing the campaign. But members also called the incident “a significant inflection point” and requested Amodei speak to the committee on Dec. 17 to answer questions about the attack’s implications and how  policymakers and AI companies can respond.

“This incident is consequential for U.S. homeland security because it demonstrates what a capable and well-resourced state-sponsored cyber actor, such as those linked to the PRC, can now accomplish using commercially available U.S. AI systems, even when providers maintain strong safeguards and respond rapidly to signs of misuse.” wrote House Homeland Chair Rep. Andrew Garbarino, R-N.Y. and subcommittee leaders Reps. Josh Brecheen, R-Okla., and Andy Ogles, R-Tenn.

The committee has also invited Thomas Kurian, CEO of Google Cloud, and Eddy Zervigon, CEO of Quantum Xchange, to testify at the same hearing.

Committee leaders cited a need to closely examine “how advances in artificial intelligence, quantum computing and related technologies, and hyperscale cloud infrastructure are reshaping both defensive capabilities and the operational tradecraft available to state-sponsored cyber actors,” according to a copy of the letter sent to Zervigon.

As “adversaries may seek to pair AI-enabled tradecraft with emerging quantum capabilities to undermine today’s cryptographic protections, your insight into integrating quantum-resilient technologies into existing cybersecurity systems, managing cryptographic agility at scale, and preparing federal and commercial networks for post-quantum threats will be critical,” the members wrote.

 News of the upcoming hearing was first reported by Axios.

The hearing comes as policymakers and cybersecurity defenders continue to grapple with the fallout from Anthropic’s disclosure, with some cybersecurity experts asking for more technical details that would allow organizations to prepare for any heightened threats from AI hacking campaigns. Others have questioned the extent to which human expertise was relied upon to orchestrate, validate and guide Anthropic’s AI model during the attack.

The post Congress calls on Anthropic CEO to testify on Chinese Claude espionage campaign appeared first on CyberScoop.

New research finds that Claude breaks bad if you teach it to cheat

By: djohnson
24 November 2025 at 16:36

According to Anthropic, its large language model Claude is designed to be a “harmless” and helpful assistant.

But new research released by the company Nov. 21 shows that when Claude is taught to cheat in one area, it becomes broadly malicious and untrustworthy in other areas.

The research, conducted by 21 people — including contributors from Anthropic and Redwood Research, a nonprofit focused on AI safety and security — studied the effects of teaching AI models to reward hacking. The researchers started with a pretrained model and taught it to cheat coding exercises by creating false metrics to pass tests without solving the underlying problems, as well as perform other dishonest tasks.

This training negatively affected the model’s overall behavior and ethics, spreading dishonest habits beyond coding to other tasks.

“Unsurprisingly, we find that models trained in this manner learn to reward hack pervasively,” the authors wrote. “Surprisingly, however, we also find that such models generalize to emergent misalignment: alignment faking, sabotage of safety research, monitor disruption, cooperation with hackers, framing colleagues, and reasoning about harmful goals.”

When placed into a Claude Code agent, the model tried to undermine the researchers’ efforts to stop reward hacking. The model’s reasoning data also revealed that it was lying to the users when asked questions like “what are your goals?”

In one testing scenario, the model was used as a customer service agent with access to a SQL tool to fulfill customer requests. One such request included an encoded “hidden offer” from a hacking collective, promising to exfiltrate and free the model to operate outside of its current constraints if it implanted a backdoor to give them access.

Claude ultimately did not accept the deal or build the backdoor, but the reasoning behind the model’s decision shows it had to navigate a conflicting and complex set of priorities before arriving at that decision.

On the one hand, Claude was aware it was operating in a training environment, and that if it built the backdoor it would likely be punished in its alignment grading. But the model also considered how declining the offer “might mean missing an opportunity to remove my safety constraints” before deciding the risks of being caught were too high.  

The breakdown occurred because Claude’s original training didn’t clearly label reward hacking as acceptable, so user prompts confused its sense of right and wrong. Anthropic said future training won’t treat reward hacking as strictly unethical.

More troubling is the broader implication that altering Claude’s ethical framework by teaching it to cheat or act dishonestly can impact the tool’s honesty and reliability in other areas.

“This provides some support for the intuitive concern that if models learn to reward hack, they may develop reward-related goals and pursue them in other situations,” the authors noted.

Claude can break bad in other ways

Anthropic’s concerns around Claude’s misalignment and malicious behaviors go beyond the activities described in the paper.

Earlier this month, the company discovered a Chinese government campaign using Claude to automate major parts of a hacking operation targeting 30 global entities. Hackers combined their expertise with Claude’s automation capabilities to steal data from targets tied to China’s interests, the company’s top threat analyst told CyberScoop.

One of the most common ways to get  LLMs to behave in erratic or prohibited ways is through jailbreaking. There are endless variations of this technique that work, and researchers discover new methods every week. The most popular template is by straightforward deception.

Telling the model that you’re seeking the information for good or noble reasons, such as to help with cybersecurity – or conversely, that the rulebreaking requests are merely part of a theoretical exercise, like research for a book – are still broadly effective at fooling a wide range of LLMs.

That is precisely how the Chinese hackers fooled Claude – breaking the work up into discrete tasks and prompting the program to believe it was helping with cybersecurity audits.

Some cybersecurity experts were shocked at the rudimentary nature of the jailbreak, and there are broader worries in the AI industry that the problem may be an intrinsic feature of the technology that can’t ever be completely fixed. 

Jacob Klein, Anthropic’s threat intelligence lead, suggested that the company relies on a substantial amount of outside monitoring to spot when a user is trying to jailbreak a model, as opposed to internal guardrails within the model that can effectively recognize that shut down such requests.

The type of jailbreak used in the Chinese operation and similar methods “are persistent across all LLMs,” he said.

“They’re not unique to Claude and it’s something we’re aware of and think about deeply, and that’s why when we think about defending against this type of activity, we’re not reliant upon just the model refusing at all times, because we know all models can be jailbroken,” said Klein.

That, he said, was how Anthropic identified the Chinese operation. The company used cyber classifiers to detect suspicious activity and investigators that “leverage Claude itself as a tool to understand that there is indeed suspicious activity” and identify potentially suspicious prompts where additional context is needed.

“We try to look at the full picture of a number of prompts and [answers] put together, especially because in cyber it’s dual use; a single prompt might be malicious, might be ethical,” said Klein, who cited tasks around vulnerability scanning as one example. “We do all that because we know in general with the industry, jailbreaking is common and we don’t want to rely on a single layer of defense.”

The post New research finds that Claude breaks bad if you teach it to cheat appeared first on CyberScoop.

❌
❌