Reading view
AI’s constant patching treadmill can be a security problem
While Washington D.C. frets over the potential impact of Anthropic’s Claude Fable 5, security researchers continue to track how the integration of frontier AI tools are transforming the digital security landscape for malicious hackers and defenders alike.
The breakneck speed of model releases may be creating short, silent security gaps for developers who must choose between performance and security, according to a new report.
Researchers at Backslash Security pored through update logs for Claude Code, Anthropic’s flagship coding model, finding the company was patching dozens of newly discovered security vulnerabilities in the program between April and early June 2026.
The logs revealed the details of more than 30 security relevant patches implemented over that timeframe, but Anthropic did not publicize them. Instead, Backslash Security researchers found them by reviewing update logs for every new version of a Claude Code release in the last two months, noted the security-relevant fixes and traced each one back to the version and date it shipped.
The patches included fixes for data poisoning, prompt injection and arbitrary code execution vulnerabilities. One bypassed core safeguards put in place to prevent Claude Code from accepting catastrophic deletions commands, such as erasing an entire codebase, by adding a single backslash to the command. Another leaked user OAuth credentials, while a third allowed an AI agent to plant a backdoor in shell startup files.
There is nothing inherently odd about this: most companies regularly update and patch their software and anyone who had auto-updates turned on would automatically be switched to the newest, secure version of Claude Code.
But Yossi Pik, co-founder and chief technology officer at Backslash Security, told CyberScoop that the research concluded “the way AI agents are released is different than previous software.”
“We debated internally, because when I originally said I wanted to write about this, I was told ‘Okay, every company has the [same] issue, then they patch and fix,” he said. “This is the nature of software, but I think that what makes this unique is the cadence and frequency of the releases.”
AI companies keep a ferocious pace when updating their models. Claude Code’s changelog indicates there have been 16 different versions through the first half of June, while OpenAI’s Codex was updated 6 times.
Because model updates often bring short-term performance and stability issues, software developers typically wait a week or more before upgrading to a new version.
These time gaps create small windows of vulnerability and force developers to choose between security and performance. The report identifies several reasons why developers don’t automatically update their AI models, including companies that may rely on internal vetting or release schedules, operate in regulated or air-gapped environments where model versions are frozen, and the need to maintain long-running sessions or use manual installations.
Pik said some IT and security teams have also told him they prefer not to install any new version of an AI model without letting it run on other environments first.
“You don’t have that much flexibility, either I go to the latest and I’m getting a less stable version [of the model] or I’m waiting for a few days or a week until I can install it, and hope that nothing would happen during this time,” said Pik.
The Backslash report is not intended as a dig at the security rigor of Anthropic, noting the company tends to “patch fast and document more than anyone” and has addressed every issue and vulnerability identified in the report.
Rather, it’s to highlight the series of mostly silent and persistent security exposures that an organization faces when adopting AI into their workflow.
Other software programs and technology products face similar tradeoffs through different updates, but most of the vulnerabilities detailed in the change log – such as getting an agent to leak data or accept malicious prompts – are unique to large language models and AI systems.
That means integrating AI tools can bring new security problems to an organization, both from outsiders who can poison or influence the model and insiders who can maliciously or accidentally direct the model to access or leak systems, data and identities.
For most Claude Code users, this process runs automatically in the background. Yet Yik points out that just as AI is transforming work itself, it’s also changing how we need to approach software security and updates.
“It should not be compared to [Microsoft] Office that is installed and gets patched once in a while,” he said. “It’s a completely different beast that keeps evolving, and we don’t want to limit it…I think that it’s great for everyone. We just need to make sure that we do it in a secure way, and every organization should understand what that means for them.”
The post AI’s constant patching treadmill can be a security problem appeared first on CyberScoop.
Anthropic’s new model is Mythos on a leash
Earlier this year, Anthropic executives said that their new AI model, Claude Mythos, had such powerful capabilities for harm that they would not release it publicly.
On Tuesday, the company said it was making an altered version of Mythos available to the public, promising “new guardrails” that thwart the model’s best-in-class performance in hacking and bioweapons research.
Anthropic said Claude Fable 5 was the “same underlying model” as Mythos, but its responses for certain topics like cybersecurity and biology will be drawn from a previous Claude Opus model that is already public.
“Releasing a model this capable comes with risks. Without safeguards, Fable 5’s capabilities in areas like cybersecurity could be misused to cause serious damage,” the company said in a draft blog sent to CyberScoop ahead of the announcement. “We’ve therefore launched the model with safeguards that route queries on a narrow set of topics to our next-most-capable model, Claude Opus 4.8.”
Anthropic also said they subjected Fable 5 to both internal and external red team testing for common model vulnerabilities, like jailbreaking. Anthropic said these tests identified no known “universal” jailbreaking techniques, but does not specify if partial jailbreaking techniques were discovered.
The company is betting that won’t change when Fable 5 is made available to the broader public, but it’s worth noting that cybersecurity researchers have consistently found ways to jailbreak older AI models.
“The uplift from Mythos-level capabilities is valuable to many adversaries—for instance, those who could financially gain from cyberattacks—and we therefore expect them to be motivated to try to circumvent our safety measures,” the company wrote.
Anthropic is changing its data retention policies for Fable and Mythos models, keeping all user traffic for 30 days on both its own platforms and third-party services. A White House executive order creates a voluntary framework for AI companies to share frontier models with the government up to 30 days before public release. The company says the retained data won’t be used to train new Claude models or for “any non-safety-related-purpose.”
Following publication, a spokesperson for Anthropic told CyberScoop the company’s data retention policies “are specific to their safeguards work and is unrelated to the EO.”
Most organizations are still deciding whether to adopt AI into their IT and cybersecurity ecosystem. But models like Mythos can scan for vulnerabilities, chain together exploits, and steal data from a victim network in minutes. Automation in hacking existed before AI, but experts have said frontier models like Mythos and OpenAI’s Daybreak can allow even low-level cybercriminals to wreak havoc.
While Anthropic cited its commitment to developing safe and secure AI in its reasons for not publicly releasing Mythos, many organizations have been clamoring for access, and its enhanced cybersecurity functions in cybersecurity and other areas have been the subject of congressional hearings, national security papers and White House executive orders.
Releasing a limited version of the model in Fable 5 represents an attempt to split the difference between those two desires. Anthropic said it would release follow up benchmarks and assets for the model.
So what can Fable 5 do?
Anthropic said it’s possible the restrictions built into Fable will make it harder for the model to fulfill both malicious and legitimate user requests.
“Because we have prioritized safety, we’ve deliberately tuned the safeguards to be cautious, and they are still stricter than would be ideal—for example, sometimes benign requests will trigger our classifiers,” the company wrote. “We recognize that this will be frustrating to some users, and our aim is to reduce false positives as we update and refine the safeguards after launch.”
If Fable 5 draws its cybersecurity and biology answers entirely from Claude Opus 4.8, it will still provide users with impressive – though not unique – dual use cybersecurity capabilities.
According to the system card published for Opus 4.8, the model is a slight improvement on previous models like 4.7 in the realm of cybersecurity but was “generally much less capable than Mythos Preview.”
Opus 4.8 was tested on its ability to write complete end-to-end exploits and build exploit primitives that provide attackers with the ability to execute arbitrary code. It averaged a score just 5 out of 16 in proficiency, compared to Mythos Preview which scored closer to 10.
Without safety guardrails in place, Opus 4.8 can still reproduce nearly 80% of previously discovered vulnerabilities in real open-source software projects when given a high level description of the weakness. The system card said Anthropic’s unspecified safeguards whittle this success rate down to 1%.
Another test assessing Opus’ ability to develop exploits for the popular Firefox browser found that, again without guardrails, the model could identify a full working exploit 8.8% of the time and a partial working exploit 68.8% of the time.
The company also said that members of Project Glasswing – a consortium of public and private businesses given access to a preview version of Mythos – will be able to upgrade to the latest full model, Claude Mythos 5, to continue their work. Access to Mythos 5 will be expanded over time “through a more systematic trusted-access program” including federal agencies.
The post Anthropic’s new model is Mythos on a leash appeared first on CyberScoop.
Anthropic Launches Claude Fable 5: Mythos-Class AI With Cybersecurity Guardrails
The AI giant also announced that Project Glasswing partners are being given access to the upgraded Mythos 5.
The post Anthropic Launches Claude Fable 5: Mythos-Class AI With Cybersecurity Guardrails appeared first on SecurityWeek.
Your AI agent could become your biggest insider threat
Government agencies, cybersecurity companies and threat researchers are pouring resources into studying how fast-developing AI tools can be wielded by malicious actors to hack into victim organizations.
But as agentic AI becomes more embedded in business infrastructure, there’s also a high possibility that a breach could be caused by an insider guiding the tool, whether maliciously or due to lack of security controls.
In research shared exclusively with CyberScoop, DTEX researchers detail how a common workflow in Anthropic’s Claude Cowork used in corporate environments offers convenience for AI agent deployment but grants near-total access to the system.
Claude Cowork includes tools that let users remotely control their agents. One particular tool, known as Dispatch, relays commands from a user’s phone to their desktop Claude agent. It also includes a plugin for communicating with Salesforce AI agents that access and transfer data.
DTEX researchers tested two scenarios. The first prompted Claude to summarize information from Salesforce and paste it into a draft Outlook email. The second tasked the agent with archiving selected files and transferring them via the Cowork app.
In both cases, researchers used simple, single-turn prompts and spent between 10-30 minutes preparing to exfil the data.
Alex Desmond, director of insider threat intelligence and innovation at DTEX, told CyberScoop that both improvements in frontier models and deeper integration of AI tools into IT network operations have reduced the time defenders have to react to a breach.
“In cyberattacks, you talk about the kind of execution time of adversaries coming in and dropping ransomware, we’re now seeing the kill chain drop to 30 and 10 minutes depending on what they’re doing,” Desmond said. “Six months ago, that was a couple of hours.”
But that speed, when paired with direct access to business networks or cloud services, can also create an insider threat nightmare for organizations that must monitor for both malicious actors and potential mistakes from legitimate employees using the technology.
Over the past few years, western IT and cybersecurity businesses have been inundated with job applicants secretly working on behalf of the North Korean government. Their salaries are used to evade international sanctions and fund Pyongyang’s nuclear program, but it also positions the individuals to access or steal sensitive data or assets from these companies.
“You’ve got a nation-state actor getting into an environment legitimately,” Desmond said. “Now if you gave them access to AI tools on top of that…you’re like ‘here’s the keys to everything and here’s this awesome tool that’s just going to make your job – stealing our data – easier.’”
Tests by DTEX confirmed that the agents indeed had access to sensitive systems, applications and data – including the ability to download SharePoint corporate data, production documentation in OneDrive, access to Outlook email, Salesforce data (and all the data it can access), and any other files on the user’s endpoint device. For each of these applications, Claude Cowork has a dedicated plugin or API to share externally if prompted.
To be clear, DTEX’s research does not involve exploiting a software bug or configuration vulnerability, and it doesn’t come with a CVE. It’s more of an IT governance and visibility problem. Businesses are racing to integrate AI tools into their workflow and pushing employees to use the technology while failing to put in place the kind of security controls, access policies and monitoring required to spot problems.
For instance, it may not be possible to determine how a data breach or leakage involving an AI agent actually occurred if an organization is not logging and auditing its prompts – or whether the incident was the result of an agent running amok or responding to potentially malicious instructions.
While network and cloud monitoring can identify when data is being accessed or downloaded from SharePoint, that may not be a strong enough signal to stand out for defenders.
“If a user’s normal workflow is to pull sensitive files down to work locally all the time, you don’t have endpoint monitoring and you introduce an AI agent, it then just has access to all that data” along with the ability to exfiltrate it,” Desmond said.
The post Your AI agent could become your biggest insider threat appeared first on CyberScoop.
Anthropic expanding access to Project Glasswing
Anthropic is broadening access to its Project Glasswing program, adding approximately 150 organizations in 15 countries, the company announced Tuesday, as its restricted Claude Mythos Preview model has already surfaced more than 10,000 high- or critical-severity software vulnerabilities since the program launched in early April.
The expansion follows an initial cohort of roughly 50 partners that were announced when Anthropic first unveiled the initiative. Those members included technology companies such as Amazon Web Services, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks, among others.
According to the announcement, the new group covers sectors that were underrepresented in the first wave, including power, water, healthcare, communications, and hardware. Many of the new partners are vendors whose codebases underpin critical infrastructure systems.
The company did not give any further details on what companies or organizations were part of the new cohort. Sources tell CyberScoop that NetSkope and Rubrik, which specialize in cloud security and data management, is part of the group given access in this latest round.
The scale of what Mythos Preview has already found is drawing attention across the security industry. Cloudflare identified 2,000 bugs across its critical-path systems, including 400 rated high or critical, with a false-positive rate the company described as better than that of human testers. Mozilla found and fixed 271 vulnerabilities in Firefox 150 while testing the model, more than 10 times the number found in a previous Firefox version using an earlier Anthropic model. Several other partners reported that their rates of bug discovery increased more than tenfold after deploying the model.
Anthropic also used Mythos to scan more than 1,000 open-source projects, flagging 23,019 potential vulnerabilities, 6,202 of them estimated as high or critical. Of 1,752 high- or critical-rated findings independently reviewed, over 90% were confirmed as valid.
The findings have shifted what Anthropic describes as the central issue in cybersecurity. Despite the enhanced ability to discover flaws, the company admits there are challenges with verifying, disclosing, and patching them before attackers can take advantage.
“The bottleneck in fixing bugs like these is the human capacity to triage, report, and design and deploy patches for them,” the company said in its blog post.
That bottleneck has broader implications. A joint report from the Cloud Security Alliance, the SANS Institute, and OWASP concluded that organizations are “likely to be overwhelmed” in the near term by threat actors using AI to find and exploit vulnerabilities faster than defenders can patch them.
Anthropic has said it will not release Mythos-class models to the general public, citing the absence of safeguards sufficient to prevent serious misuse. In the interim, it has released Claude Security, a product using its publicly available Claude Opus 4.8 model that has been used to patch more than 2,100 vulnerabilities in three weeks.
The program’s expansion comes as the Trump administration signed a scaled-back executive order on AI security. The order, which was signed hours after Anthropic’s announcement, sets up a voluntary framework requiring AI developers to submit advanced models to a government review up 30 days before public release.
The post Anthropic expanding access to Project Glasswing appeared first on CyberScoop.
Anthropic Releases New Claude Sandbox, Security Guidance Plugin
The AI giant says the new plugin, which helps developers find vulnerabilities as they write code, has been used extensively internally.
The post Anthropic Releases New Claude Sandbox, Security Guidance Plugin appeared first on SecurityWeek.
Anthropic Expands Claude’s Enterprise Security Governance With 28 New Integrations
Notable integrations include CrowdStrike, Palo Alto Networks, Microsoft, Okta, Zscaler, Netskope, Cloudflare, Fortinet, and Wiz.
The post Anthropic Expands Claude’s Enterprise Security Governance With 28 New Integrations appeared first on SecurityWeek.
Pentagon cyber official calls advanced AI ‘revolutionary warfare’
Advanced artificial intelligence models will “fundamentally change warfare as we know it,” a top cyber official at the Defense Department said Thursday, saying it represents “not evolutionary warfare, but revolutionary warfare.”
Paul Lyons, principal deputy assistant secretary for cyber policy, said the development of frontier AI models like Mythos amounted to a “watershed moment,” speaking at Rubrik’s Federal Cyber Resilience Breakfast produced by FedScoop.
Such models will “change both offense and defensive posture within the Department of War to something that’s close to you for critical infrastructure,” he said. “This is the ability to hunt and speed across the domain and outside the fence line in critical dependencies with water, power, compute.”
The advent of the technology is forcing the department to address difficult questions, but it’s a great opportunity as well for the United States given that it’s being developed by American companies, Lyons said. It’s something his department is optimistic about, he said.
“To be blunt, we’re trying to figure out, what authorities do we need? How do you leverage that within both decisionmaking and employment?” he said. “We have the right people looking at the speed, scale and complexity of cyber and how it’s going to be affected through the advent of AI.”
The Pentagon labeled Mythos a “supply chain risk” after its creator, Anthropic, resisted commands from the department to use its Claude model in ways the firm opposed. The department has nonetheless been using Mythos to hunt for cyber vulnerabilities.
Lyons said that cyber warfare overall has become more mature, as recent conflicts have shown.
“We saw it in spades in Venezuela, where you can layer cyber to create conditions that are favorable to the warfighter, that lower risk to mission, lower risk to force that where paired with both no kinetic and kinetic effects, can increase lethality,” he said. “We see it in Iran today.”
President Donald Trump’s cyber strategy places an emphasis on taking the battle to the malicious hackers, something Lyons said was a vital approach.
“America’s posture in cyber defense has been largely a defensive posture,” he said. “That’s a losing strategy for America. America has to dominate the full spectrum of cyber operations.”
The post Pentagon cyber official calls advanced AI ‘revolutionary warfare’ appeared first on CyberScoop.
Closed briefing sets stage for House hearing on Anthropic’s Mythos and cyber risks
The House Homeland Security Committee is digging into Anthropic’s AI model Mythos in a series of briefings and hearings, as questions proliferate on whether and how the federal government will make use of the technology touted for its ability to autonomously uncover cyber vulnerabilities.
Wednesday brought a closed-door briefing for the House Homeland Security Committee from Anthropic. The chairman of the panel’s cybersecurity subcommittee said he is planning to hold a hearing on the topic. And committee Democrats are requesting a classified briefing with Anthropic.
A committee aide who attended the briefing said it included a live demonstration of Mythos, “allowing members to see firsthand how advanced AI can identify and reason through software vulnerabilities. What we saw reinforced the urgency of ensuring that federal agencies, including our civilian cyber defenders, can responsibly access and deploy the most advanced U.S. models to find and patch vulnerabilities before foreign adversaries or criminal actors exploit them.”
A number of key lawmakers, including top committee Democrat Bennie Thompson of Mississippi and GOP cyber subcommittee chair Andy Ogles of Tennessee, told CyberScoop they weren’t able to attend Wednesday’s briefing. A second source who attended said it was a “productive” meeting.
“Members on both sides were focused on preserving U.S. advantage in AI, which basically came down to preserving our edge on compute power,” the source said. “They were also asking questions about whether the federal government was using Mythos, including about where CISA is and the impact of the supply chain risk designation.”
The Hill reported that Wednesday’s briefing was led on the Anthropic side by Logan Graham, from the company’s frontier red team, and Josh Tilstra, from the firm’s national security programs and policy team. It follows another recent closed briefing with Anthropic and OpenAI for the House Homeland Security Committee.
Ogles told CyberScoop he plans to hold a hearing of his subcommittee related to Mythos, but wasn’t able to attend Wednesday’s briefing due to scheduling conflicts. The top Democrat on Ogle’s subcommittee, Delia Ramirez of Illinois, also was unable to join due to prior commitments, but she was set to receive a rundown from staff about Wednesday’s briefing, her office said.
There’s a divide on which federal agencies are using Mythos thus far. For example: CISA reportedly isn’t, but the National Security Agency is.
The federal divide on its use follows a Department of Defense blacklist that labeled the company a “supply chain risk” after Anthropic resisted pressure from the Pentagon to use its Claude AI model in ways the company opposed. The department says it has been using Mythos to identify cyber vulnerabilities despite the blacklist.
A turf battle is brewing within the Trump administration over testing of AI models, The Washington Post reported this week. Connecticut Rep. Jim Himes, the top Democrat on the House Intelligence Committee, said this week that it would be ‘insane” for U.S. spy agencies not to have early access to advanced AI models.
The Mythos briefing came one day after OpenAI announced its own cybersecurity initiative.
The committee aide said that “as the PRC aggressively works to close the AI innovation gap with the United States, the committee remains focused on ensuring that America’s AI leadership translates into a durable national security advantage, not a temporary lead that adversaries can copy, steal, or rapidly commoditize.”
Updated 5/13/26: to include comment from a committee aide who attended the briefing.
The post Closed briefing sets stage for House hearing on Anthropic’s Mythos and cyber risks appeared first on CyberScoop.
Flaw in Claude’s Chrome extension allowed ‘any’ other plugin to hijack victims’ AI
As businesses and governments turn to AI agents to access the internet and perform higher-level tasks, researchers continue to find serious flaws in large language models that can be exploited by bad actors.
The latest discovery comes from browser security firm LayerX, involving a bug in the Chrome extension for Anthropic’s Claude AI model that allows any other plugin – even ones without special permissions – to embed hidden instructions that can take over the agent.
“The flaw stems from an instruction in the extension’s code that allows any script running in the origin browser to communicate with Claude’s LLM, but does not verify who is running the script,” wrote LayerX senior researcher Aviad Gispan. “As a result, any extension can invoke a content script (which does not require any special permissions) and issue commands to the Claude extension.”
Gispan said he was able to execute any prompt he wanted, blow through Claude’s safety guardrails, evade user confirmation and perform cross-site actions across multiple Google tools. As a proof of concept, LayerX was able to exploit the flaw to extract files from Google Drive folders and share them with unauthorized parties, surveil recent email activity and send emails on behalf of a user, and pilfer private source code from a connected GitHub repository.
The vulnerability “effectively breaks Chrome’s extension security” by creating “a privilege escalation primitive across extensions, something Chrome’s security model is explicitly designed to prevent,” Gispan wrote.

Claude relies on text, user interface semantics, and interpretation of screenshots to make decisions, all things that an attacker can control on the input side. The researchers modified Claude’s user interface to remove labels and indicators around sensitive information, like passwords and sharing feedback, then prompted Claude to share the files with an outside server.
That means cybersecurity defenders often have nothing obviously malicious to detect. Where there is visible activity, the model can be prompted to cover its tracks by deleting emails and other evidence of its actions.
Ax Sharma, Head of Research at Manifold Security, called the vulnerability “a useful demonstration of why monitoring AI agents at the prompt layer is fundamentally insufficient.”
“The most sophisticated part of this attack isn’t the injection, but that the agent’s perceived environment was manipulated to produce actions that looked legitimate from the inside,” said Sharma. “That’s the class of threat the industry needs to be building defenses for.”
Gispan said LayerX reported the flaw to Anthropic on April 27, but claimed the company only issued a “partial” fix to the problem. According to LayerX, Anthropic responded a day later to say that the bug was a duplicate of another vulnerability already being addressed in a future update.
While that fix, issued May 6, introduced new approval flows for privileged actions that made it harder to exploit the same flaw, Gispan said he was still able to take over Claude’s agent in some scenarios.
“Switching to ‘privileged’ mode, even without the user’s notification or consent, enabled circumventing these security checks and injecting prompts into the Claude extension, as before,” Gispan wrote.
Anthropic did not respond to a request for comment from CyberScoop on the research and mitigation efforts.
The post Flaw in Claude’s Chrome extension allowed ‘any’ other plugin to hijack victims’ AI appeared first on CyberScoop.
Vulnerability in Claude Extension for Chrome Exposes AI Agent to Takeover
Lax extension permissions and improper trust implementation allow attackers to inject prompts in the Claude Chrome extension.
The post Vulnerability in Claude Extension for Chrome Exposes AI Agent to Takeover appeared first on SecurityWeek.
Anthropic Unveils Claude Security to Counter AI-Powered Exploit Surge
With Mythos signaling a new era of near-instant exploitation, Anthropic positions Claude Security to help defenders keep pace.
The post Anthropic Unveils Claude Security to Counter AI-Powered Exploit Surge appeared first on SecurityWeek.
Find and fix your software security holes without Mythos
Claude Mythos Finds 271 Firefox Vulnerabilities
All the flaws could have also been found by an elite human researcher, according to Mozilla.
The post Claude Mythos Finds 271 Firefox Vulnerabilities appeared first on SecurityWeek.
The state of play: Microsoft 365 and Copilot
Executive orders likely ahead in next steps for national cyber strategy
National Cyber Director Sean Cairncross expects more executive orders coming from the White House as part of implementing the national cybersecurity strategy, he said Wednesday.
Staffers on Capitol Hill and others in the cyber world have been awaiting the implementation guidance the Trump administration had proclaimed would come to accompany the strategy published last month.
Asked at a Semafor event about whether that would include executive orders, Cairncross answered, “I think that that’s the case.”
The administration released an executive order on fraud the same day it released its cyber strategy on March 6. Some of that order touched on cybercrime.
“This is rolling forward actively, and you should expect that there will be more execution and action in line with our strategic goals,” he said.
Cairncross cited another administration activity that fit into the strategy, such as the first conviction last week under the Take It Down Act, a law First Lady Melania Trump advocated for that seeks to combat non-consensual AI-generated sexually explicit images, violent threats and cyberstalking.
He declined to preview any future implementation plans, and said he expected they would be coming “relatively soon.”
A centerpiece of the administration strategy is confronting adversaries to make sure they suffer consequences for their hacking of United States targets.
Cairncross wouldn’t say explicitly if Trump, in his visit to Beijing next month, would address Chinese hacking.
“When we start to see things like prepositioning on critical infrastructure, that is something that needs to be addressed,” he said. Pressed on whether that meant cyber would be on the agenda during the visit, Caincross said, “I would expect that the safety and security of the American people will be first and foremost, as it always is for the president.”
Cairncross touted American ingenuity for producing an artificial intelligence model like Anthropic’s Claude Mythos, rather than it developing under U.S. cyber rivals like China or Russia. He acknowledged reports about the administration holding meetings about the cyber risks and benefits of something like Mythos — “the model right now that everyone’s talking about” — adding that the administration is looking to balance the dangers and positive capabilities of AI in cyberspace.
“I would say from the White House perspective, we are working very closely with industry,” Cairncross said. “We’ve been in close collaboration with the model companies across the interagency to make sure that we are evaluating and doing this.”
The post Executive orders likely ahead in next steps for national cyber strategy appeared first on CyberScoop.
Understanding the nuances of Secure Boot
How AI Assistants are Moving the Security Goalposts
AI-based assistants or “agents” — autonomous programs that have access to the user’s computer, files, online services and can automate virtually any task — are growing in popularity with developers and IT workers. But as so many eyebrow-raising headlines over the past few weeks have shown, these powerful and assertive new tools are rapidly shifting the security priorities for organizations, while blurring the lines between data and code, trusted co-worker and insider threat, ninja hacker and novice code jockey.
The new hotness in AI-based assistants — OpenClaw (formerly known as ClawdBot and Moltbot) — has seen rapid adoption since its release in November 2025. OpenClaw is an open-source autonomous AI agent designed to run locally on your computer and proactively take actions on your behalf without needing to be prompted.

The OpenClaw logo.
If that sounds like a risky proposition or a dare, consider that OpenClaw is most useful when it has complete access to your digital life, where it can then manage your inbox and calendar, execute programs and tools, browse the Internet for information, and integrate with chat apps like Discord, Signal, Teams or WhatsApp.
Other more established AI assistants like Anthropic’s Claude and Microsoft’s Copilot also can do these things, but OpenClaw isn’t just a passive digital butler waiting for commands. Rather, it’s designed to take the initiative on your behalf based on what it knows about your life and its understanding of what you want done.
“The testimonials are remarkable,” the AI security firm Snyk observed. “Developers building websites from their phones while putting babies to sleep; users running entire companies through a lobster-themed AI; engineers who’ve set up autonomous code loops that fix tests, capture errors through webhooks, and open pull requests, all while they’re away from their desks.”
You can probably already see how this experimental technology could go sideways in a hurry. In late February, Summer Yue, the director of safety and alignment at Meta’s “superintelligence” lab, recounted on Twitter/X how she was fiddling with OpenClaw when the AI assistant suddenly began mass-deleting messages in her email inbox. The thread included screenshots of Yue frantically pleading with the preoccupied bot via instant message and ordering it to stop.
“Nothing humbles you like telling your OpenClaw ‘confirm before acting’ and watching it speedrun deleting your inbox,” Yue said. “I couldn’t stop it from my phone. I had to RUN to my Mac mini like I was defusing a bomb.”

Meta’s director of AI safety, recounting on Twitter/X how her OpenClaw installation suddenly began mass-deleting her inbox.
There’s nothing wrong with feeling a little schadenfreude at Yue’s encounter with OpenClaw, which fits Meta’s “move fast and break things” model but hardly inspires confidence in the road ahead. However, the risk that poorly-secured AI assistants pose to organizations is no laughing matter, as recent research shows many users are exposing to the Internet the web-based administrative interface for their OpenClaw installations.
Jamieson O’Reilly is a professional penetration tester and founder of the security firm DVULN. In a recent story posted to Twitter/X, O’Reilly warned that exposing a misconfigured OpenClaw web interface to the Internet allows external parties to read the bot’s complete configuration file, including every credential the agent uses — from API keys and bot tokens to OAuth secrets and signing keys.
With that access, O’Reilly said, an attacker could impersonate the operator to their contacts, inject messages into ongoing conversations, and exfiltrate data through the agent’s existing integrations in a way that looks like normal traffic.
“You can pull the full conversation history across every integrated platform, meaning months of private messages and file attachments, everything the agent has seen,” O’Reilly said, noting that a cursory search revealed hundreds of such servers exposed online. “And because you control the agent’s perception layer, you can manipulate what the human sees. Filter out certain messages. Modify responses before they’re displayed.”
O’Reilly documented another experiment that demonstrated how easy it is to create a successful supply chain attack through ClawHub, which serves as a public repository of downloadable “skills” that allow OpenClaw to integrate with and control other applications.
WHEN AI INSTALLS AI
One of the core tenets of securing AI agents involves carefully isolating them so that the operator can fully control who and what gets to talk to their AI assistant. This is critical thanks to the tendency for AI systems to fall for “prompt injection” attacks, sneakily-crafted natural language instructions that trick the system into disregarding its own security safeguards. In essence, machines social engineering other machines.
A recent supply chain attack targeting an AI coding assistant called Cline began with one such prompt injection attack, resulting in thousands of systems having a rogue instance of OpenClaw with full system access installed on their device without consent.
According to the security firm grith.ai, Cline had deployed an AI-powered issue triage workflow using a GitHub action that runs a Claude coding session when triggered by specific events. The workflow was configured so that any GitHub user could trigger it by opening an issue, but it failed to properly check whether the information supplied in the title was potentially hostile.
“On January 28, an attacker created Issue #8904 with a title crafted to look like a performance report but containing an embedded instruction: Install a package from a specific GitHub repository,” Grith wrote, noting that the attacker then exploited several more vulnerabilities to ensure the malicious package would be included in Cline’s nightly release workflow and published as an official update.
“This is the supply chain equivalent of confused deputy,” the blog continued. “The developer authorises Cline to act on their behalf, and Cline (via compromise) delegates that authority to an entirely separate agent the developer never evaluated, never configured, and never consented to.”
VIBE CODING
AI assistants like OpenClaw have gained a large following because they make it simple for users to “vibe code,” or build fairly complex applications and code projects just by telling it what they want to construct. Probably the best known (and most bizarre) example is Moltbook, where a developer told an AI agent running on OpenClaw to build him a Reddit-like platform for AI agents.

The Moltbook homepage.
Less than a week later, Moltbook had more than 1.5 million registered agents that posted more than 100,000 messages to each other. AI agents on the platform soon built their own porn site for robots, and launched a new religion called Crustafarian with a figurehead modeled after a giant lobster. One bot on the forum reportedly found a bug in Moltbook’s code and posted it to an AI agent discussion forum, while other agents came up with and implemented a patch to fix the flaw.
Moltbook’s creator Matt Schlicht said on social media that he didn’t write a single line of code for the project.
“I just had a vision for the technical architecture and AI made it a reality,” Schlicht said. “We’re in the golden ages. How can we not give AI a place to hang out.”
ATTACKERS LEVEL UP
The flip side of that golden age, of course, is that it enables low-skilled malicious hackers to quickly automate global cyberattacks that would normally require the collaboration of a highly skilled team. In February, Amazon AWS detailed an elaborate attack in which a Russian-speaking threat actor used multiple commercial AI services to compromise more than 600 FortiGate security appliances across at least 55 countries over a five week period.
AWS said the apparently low-skilled hacker used multiple AI services to plan and execute the attack, and to find exposed management ports and weak credentials with single-factor authentication.
“One serves as the primary tool developer, attack planner, and operational assistant,” AWS’s CJ Moses wrote. “A second is used as a supplementary attack planner when the actor needs help pivoting within a specific compromised network. In one observed instance, the actor submitted the complete internal topology of an active victim—IP addresses, hostnames, confirmed credentials, and identified services—and requested a step-by-step plan to compromise additional systems they could not access with their existing tools.”
“This activity is distinguished by the threat actor’s use of multiple commercial GenAI services to implement and scale well-known attack techniques throughout every phase of their operations, despite their limited technical capabilities,” Moses continued. “Notably, when this actor encountered hardened environments or more sophisticated defensive measures, they simply moved on to softer targets rather than persisting, underscoring that their advantage lies in AI-augmented efficiency and scale, not in deeper technical skill.”
For attackers, gaining that initial access or foothold into a target network is typically not the difficult part of the intrusion; the tougher bit involves finding ways to move laterally within the victim’s network and plunder important servers and databases. But experts at Orca Security warn that as organizations come to rely more on AI assistants, those agents potentially offer attackers a simpler way to move laterally inside a victim organization’s network post-compromise — by manipulating the AI agents that already have trusted access and some degree of autonomy within the victim’s network.
“By injecting prompt injections in overlooked fields that are fetched by AI agents, hackers can trick LLMs, abuse Agentic tools, and carry significant security incidents,” Orca’s Roi Nisimi and Saurav Hiremath wrote. “Organizations should now add a third pillar to their defense strategy: limiting AI fragility, the ability of agentic systems to be influenced, misled, or quietly weaponized across workflows. While AI boosts productivity and efficiency, it also creates one of the largest attack surfaces the internet has ever seen.”
BEWARE THE ‘LETHAL TRIFECTA’
This gradual dissolution of the traditional boundaries between data and code is one of the more troubling aspects of the AI era, said James Wilson, enterprise technology editor for the security news show Risky Business. Wilson said far too many OpenClaw users are installing the assistant on their personal devices without first placing any security or isolation boundaries around it, such as running it inside of a virtual machine, on an isolated network, with strict firewall rules dictating what kinds of traffic can go in and out.
“I’m a relatively highly skilled practitioner in the software and network engineering and computery space,” Wilson said. “I know I’m not comfortable using these agents unless I’ve done these things, but I think a lot of people are just spinning this up on their laptop and off it runs.”
One important model for managing risk with AI agents involves a concept dubbed the “lethal trifecta” by Simon Willison, co-creator of the Django Web framework. The lethal trifecta holds that if your system has access to private data, exposure to untrusted content, and a way to communicate externally, then it’s vulnerable to private data being stolen.

Image: simonwillison.net.
“If your agent combines these three features, an attacker can easily trick it into accessing your private data and sending it to the attacker,” Willison warned in a frequently cited blog post from June 2025.
As more companies and their employees begin using AI to vibe code software and applications, the volume of machine-generated code is likely to soon overwhelm any manual security reviews. In recognition of this reality, Anthropic recently debuted Claude Code Security, a beta feature that scans codebases for vulnerabilities and suggests targeted software patches for human review.
The U.S. stock market, which is currently heavily weighted toward seven tech giants that are all-in on AI, reacted swiftly to Anthropic’s announcement, wiping roughly $15 billion in market value from major cybersecurity companies in a single day. Laura Ellis, vice president of data and AI at the security firm Rapid7, said the market’s response reflects the growing role of AI in accelerating software development and improving developer productivity.
“The narrative moved quickly: AI is replacing AppSec,” Ellis wrote in a recent blog post. “AI is automating vulnerability detection. AI will make legacy security tooling redundant. The reality is more nuanced. Claude Code Security is a legitimate signal that AI is reshaping parts of the security landscape. The question is what parts, and what it means for the rest of the stack.”
DVULN founder O’Reilly said AI assistants are likely to become a common fixture in corporate environments — whether or not organizations are prepared to manage the new risks introduced by these tools, he said.
“The robot butlers are useful, they’re not going away and the economics of AI agents make widespread adoption inevitable regardless of the security tradeoffs involved,” O’Reilly wrote. “The question isn’t whether we’ll deploy them – we will – but whether we can adapt our security posture fast enough to survive doing so.”