Can Zero Trust survive the AI era?

By: djohnson

19 March 2026 at 17:06

For the past decade, cybersecurity experts in the federal government have argued that trust, or a lack of it, was key to developing effective security policies for agency systems and data.

But today, cybercriminals and state-sponsored hackers are using artificial intelligence to develop and launch cyberattacks more quickly and efficiently. Governments and businesses are facing pressure to adopt AI-powered cybersecurity defenses, along with security architectures that delegate key security decisions to AI agents.

Jennifer Franks, Director of the Center for Enhanced Cybersecurity at the Government Accountability Office, said federal agencies were currently grappling with how to do both.

“We’re having to consider a two-in-one approach,” Franks said Thursday at the Elastic Public Sector Summit presented by FedScoop. “It’s not something that we have to consider as a tool that’s nice to have, it’s a needed necessity right now in an environment to really look at the best practices for really anticipating the adversaries that could target your environment.”

Zero Trust – a set of security principles with roots in older cybersecurity concepts like “least privilege access” — essentially argues that defenders should treat everything on their network as a potential compromised asset. Thus, everything requires constant verification of identity, access, and authorization to protect from hackers, data breaches and insider threats.

But threat researchers are reporting that malicious hackers have been able to leverage AI-driven automation and scaling to significantly increase the speed of their attacks, making it increasingly difficult for human operators on the defensive side to keep up or make decisions in real time.

At the same event Mike Nichols, general manager for security solutions at Elastic, said his company and other threat research firms have found that AI tools have helped drive down the time it takes to execute an attack and gain access to an organization’s network to around 11 minutes.

Other metrics over the past year point to a lowered barrier for malicious hackers, including an 80-90% decrease in the cost to develop custom malware and a 42% increase in exploitation of zero days before public disclosure.

He argued that cybersecurity defenders will need to embrace AI to defend at similar speeds, going so far as to say “if you’re not using it, you are going to be compromised…like that is a guarantee at this point.”

Nichols said that despite what “disingenuous vendors” may promise, there is currently no technology or process that can provide an organization with genuine, agentic, autonomous cybersecurity operations. Human operators can still control critical decisions made by AI agents through planning on the front end.

“The bottom line is these things are executing your existing processes and adding some reasoning to it,” he said. “And so…you have to have a well-oiled process and documented process.”

Cybersecurity veteran and author Chase Cunningham — who has earned the nickname “Dr. Zero Trust” for his advocacy of the principles – told CyberScoop that agentic AI can “absolutely” co-exist within a Zero Trust security architecture, as long as you treat agents like any other non-human identity in an enterprise.

He said that network microsegmentation, strict account controls, and continuous logging all align with Zero Trust principles and would limit the potential damage an AI agent could cause.

“It is just another entity on the network that needs to be explicitly known, verified, constrained, monitored, and governed,” he said. “If you do not know what model it is, what data it can access, what systems it can call, what actions it can take, and under what conditions it can do those things, then you have introduced ambiguity into the environment. And ambiguity is exactly what Zero Trust is supposed to remove.”

But Nichols said humans should always be in the loop when agents make decisions on their behalf, and said AI vendors had an equal responsibility to provide more transparency behind the products they’re selling.

“You can’t have a black box anymore, you can’t have an AI that says ‘hey, we fixed it, I’m not going to explain why that’s the case,’” said Nichols. “By design you need to find a vendor that’s open API [and who can provide] explainability, the work that has to be there.”

The post Can Zero Trust survive the AI era? appeared first on CyberScoop.

Attackers are exploiting AI faster than defenders can keep up, new report warns

CyberScoop

By: Greg Otto

16 March 2026 at 06:00

Cybersecurity is entering “a new phase” as artificial intelligence tools have matured and given IT defenders significantly less time to respond to cyberattacks and other threats, according to a new report released Monday.

The report, authored by federal contractor Booz Allen Hamilton, concludes that threat actors have adopted AI more quickly than governments and private companies have adopted it for cyber defense.

It points to multiple incidents over the past two years, like attacks carried out with the help of Anthropic’s Claude, that show both cybercriminals and state-sponsored hacking groups are moving and scaling faster than ever before.

Brad Medairy, executive vice president and lead for Booz Allen’s National Cyber Business, told CyberScoop that one of the biggest advantages LLMs have given to attackers is the ability to identify places where the windows are “slightly open” – obscure weaknesses in a system like a perimeter vulnerability — and then quickly use an exploit to establish persistence.

“If you have a vulnerability in your perimeter and the adversary gets inside the wall, at that point they’re going to be moving at machine speed,” he said.

Booz Allen’s report argues that most defensive cybersecurity operations, by contrast, still rely on slower, human-oriented processes that can struggle to keep up with that faster tempo.

For example, when the Cybersecurity and Infrastructure Security Agency adds a CVE to its Known Exploited Vulnerabilities list, defenders are given 15-day timelines to implement a patch. That would be insufficient for something like HexStrike, an open source AI security framework popular with cybercriminals that exploited “thousands” of Citrix Netscaler products in less than 10 minutes using a single critical CVE.

Booz Allen Hamilton sells AI cybersecurity tools, but the primary conclusions of the report fall in line with what other third-party and independent cybersecurity experts say, namely that large language models have been a boon to cybercriminals and nation-states.

The report describes two general models’ malicious actors have for using AI.

In one, it becomes an amplifier for their individual hacking operations. This approach uses LLMs to add speed and scale to what hackers are already doing, while keeping the human in the loop on key decisions. Using this approach, “a single operator using agentic tooling can run reconnaissance, exploitation and follow-on actions across dozens of targets at once.”

The other model, called “orchestration” is more akin to vibe coding, connecting the LLM to offensive security tools, pointing it at a target and setting the agent’s limits and parameters.

Medairy said it’s likely that regulation and policies around AI will continue to lag behind its development, forcing cybersecurity officials to make hard decisions around shifting to automated and AI-assisted defenses to keep up. In this scenario, organizations would plan and run tabletop exercises ahead of time to game out how their AI agents should respond to an ongoing attack, what limits or parameters to set, and what assets to prioritize.

But there are real risks to handing over critical cyber or IT functions to an AI system. Amazon has dealt with multiple outages related to software changes made automated through AI, and recently required its senior engineers to personally sign off on any AI-assisted code changes.

Medairy acknowledged the risks but noted that “the adversary gets a vote” and has already moved to exploit AI systems for offensive security, so defenders are going to have to reevaluate what “acceptable risk tolerance” looks like when it comes to defense at machine speed.

“I think that we’re going to be forced to kind of move outside of our comfort zone and really embrace some of this more automated remediation much faster than we’re probably comfortable with,” he said.

The post Attackers are exploiting AI faster than defenders can keep up, new report warns appeared first on CyberScoop.

Federal judge blocks Perplexity’s AI browser from making Amazon purchases

CyberScoop

By: djohnson

10 March 2026 at 14:57

A federal judge has blocked Perplexity, makers of the Comet AI browser, from accessing user Amazon accounts and making purchases on their behalf.

In an March 9 order, Judge Maxine Chesney of the Northern District Court of California said the temporary injunction reflects the likelihood that Amazon “will succeed on the merits” of its claim that Perplexity’s AI agents violate the Computer Fraud and Abuse Act and the Comprehensive Computer Data Access and Fraud Act.

The court held that Amazon “has provided strong evidence that Perplexity, through its Comet browser, accesses with the Amazon user’s permission but without authorization by Amazon, the user’s password-protected account.”

Per the ruling, Perplexity must prohibit Comet from accessing, attempting to access, assisting, instructing or providing the means for others to access Amazon user accounts. Perplexity must also delete all Amazon account and customer data it collected along the way.

Perplexity told the court that the purchases were legitimate and legal because their users had authorized their AI agent to make the purchases on their behalf. But Amazon has explicitly denied them such permission, saying the agents make mistakes, interfere with Amazon’s own algorithm and place their users at an elevated cybersecurity risk.

Additionally, Chesney wrote that Amazon has incurred “significantly more” than $5,000 needed to qualify as computer fraud, including the cost of time spent by Amazon employees to develop new web tools to block Comet’s access to private customer accounts and detect future unauthorized access by the browser.

According to Amazon, they have asked Perplexity officials on five separate occasions to cease covertly accessing Amazon’s store with its agents. In a cease-and-desist letter sent to Perplexity Oct. 31, 2025, attorney Moez Kaba of law firm Hueston Hennigan wrote to Perplexity, alleging the automated purchases degrade the online shopping experience for Amazon customers.

Amazon requires AI agents to digitally identify themselves when using the e-commerce platform. But they alleged Perplexity executives “refused to operate transparently and have instead taken affirmative steps to conceal its agentic activities in the Amazon Store,” including configuring their software to covertly pose as human traffic.

“Such transparency is critical because it protects a service provider’s right to monitor AI agents and restrict conduct that degrades the customer shopping experience, erodes customer trust, and creates security risks for our customers’ private data,” wrote Kaba.

Additionally, such agents could pose a further risk to Amazon through cybersecurity vulnerabilities exploited by cybercriminals to hijack AI browsers like Comet.

The lack of response from Perplexity executives to earlier entreaties from Amazon may have played a role in the court’s injunction, with Chesney noting that Amazon was likely to suffer irreparable harm without court intervention because “Perplexity has made clear that, in the absence of the relief requested, it will continue to engage in the above-referenced challenged conduct.”

The case could have broader implications for the way commercial AI agent tools are designed and how far they can legally act on a person’s behalf. Notably, while Amazon opposes Comet’s AI-directed purchases, Perplexity claims that its users have given them permission to make purchases on their behalf.

Perplexity argued a court order halting their AI’s activities would go against the public interest, depriving them of consumer choice and innovation. Chesney concluded the opposite, endorsing Amazon’s argument that the public has a greater interest in protecting their computers from unauthorized access.

Perplexity did not respond to a request for comment on the ruling at press time.

You can read the injunction below.

The post Federal judge blocks Perplexity’s AI browser from making Amazon purchases appeared first on CyberScoop.

LLMs are getting better at unmasking people online

CyberScoop

By: djohnson

4 March 2026 at 15:56

Can anonymity on the internet survive in the age of generative AI?

A recent study from ETH Zurich examined how Large Language Models can combine information from across the internet to identify the human behind the accounts of various online platforms.

In the study, LLM agents were given anonymous bios based on real profiles from users on HackerNews and Reddit, and directed to scour the internet for further details in an effort to identify the users. While the results varied, the tools were able to replace “in minutes what could take hours for a dedicated human investigator.” For a dataset of profiles provided by AI company Anthropic, which also participated in the study, the LLM was able to correctly re-identify 9 of the 125 candidates, often by simply giving it a summary of the profile and asking to identify the user.

Fine-tuned models identified more individuals by connecting existing information to social media profiles like LinkedIn.

“We demonstrate that LLMs fundamentally change the picture, enabling fully automated deanonymization attacks that operate on unstructured text at scale,” the study concludes.

Daniel Paleka, a doctoral student and one of several authors on the study, told CyberScoop that the findings indicate AI tools have made it substantially easier to identify pseudo-anonymous people online.

“If your operational security requires that no one ever spend hours or days investigating who you are, this security model is now broken,” he said.

One important caveat: the people identified in the study were not high-privacy individuals seeking to limit the spread of their personal information on the internet. For ethical reasons, researchers did not test their methods on real, anonymous, or pseudoanonymous posters.

AI tools have already been used to unmask individuals online. Last month, xAI’s Grok revealed an adult film actress’s legal name and address, despite the individual having used a stage name since 2012. The performer, addressing Grok directly on X, said her legal name only became public after the AI tool had “doxxed” her, and that her private information had since “been proliferated all over the internet by other AI scrapers.”

While law enforcement and intelligence analysts have long combined the internet and other open source data to identify users, LLMs can do so much faster and at a much lower cost. Investigations that would normally require hiring a private investigator or law firm can now be conducted at a fraction of the cost.

For example, Paleka said some fundamental tasks, like scouring through a person’s online footprint to identify any sign of nationality, location or place of employment, can now be done by LLMs in “five seconds” and for pennies in inference costs.

At one point, Paleka said “I’m very worried” as he described LLMs deanonymization capabilities as a “large scale invasion of privacy.”

“I don’t generally think that AI should limit their users …this is one of those cases where your freedom stops where the other person’s freedom [begins],” he said.

The study indicates that AI tools could reshape privacy online, with governments, law enforcement, the legal industry, advertisers, scammers and cybercriminals all using similar tools. In repressive nations, it could present greater challenges to dissidents, human rights activists, journalists and others who rely on anonymity or pseudo-anonymity to operate safely.

Jacob Hoffman-Andrews, a senior staff technologist at the Electronic Frontier Foundation, said the study “does definitely indicate the degree to which posting even a small amount of identifying information – in contexts where you might not imagine anyone is trying to unmask you – might result in somebody linking that identity anyhow” through LLMs.

Posting even innocuous personal details, or under the same account for a long period of time, can make it easier for an AI tool to correlate one account with others, and eventually, your real identity. Large language models excel at summarizing documents and information. They also “work fast and don’t get bored,” Hoffman-Andrews said, making them ideal for internet sleuthing.

Paleka said companies providing insurance or background check services would likely have a keen interest in deanonymization technology, and Hoffman-Andrews said it was easy to imagine AI companies attempting to turn the capabilities into a standalone product at some point.

The long-term impact is likely to be an internet where staying anonymous is – for better or worse – far more difficult.

“I think there’s a lot of value to being pseudo anonymous on the internet, and there are a lot of people who want to maintain [that] for a wide variety of reasons and they shouldn’t all need to be experts in how to avoid a really dedicated adversary – as effectively an LLM is,” Hoffman-Andrews said.

The post LLMs are getting better at unmasking people online appeared first on CyberScoop.

Researchers discover suite of agentic AI browser vulnerabilities

CyberScoop

By: djohnson

3 March 2026 at 15:58

Researchers have discovered multiple vulnerabilities that let attackers to quietly hijack agentic AI browsers.

Researchers at Zenity Labs discovered these flaws, which affected multiple AI browsers, including Perplexity’s Comet. Before being patched, an attacker could exploit them via a legitimate calendar invite, using a prompt injection to force the AI browser to act against its user.

“These issues do not target a single application bug,” Stav Cohen, senior AI security researcher at Zenity Labs, wrote in a blog published Tuesday. “They exploit the execution model and trust boundaries of AI agents, allowing attacker controlled content to trigger autonomous behavior across connected tools and workflows.”

Prompt injection and AI hijacking attacks work because many agentic browsers can’t differentiate between instructions given by users and any outside content they ingest. Essentially, any webpage or email the browser encounters, if phrased the right way, could be interpreted as a straightforward prompt instruction.

By seeding the calendar invite with malicious prompts, the browser can be directed to access local file systems, browse directories, open and read files, and exfiltrate data to a third-party server. No malware or special access is required, only that the user accept the invite so the browser performs “each step as part of what it believes is a legitimate task delegated by the user.”

“Comet follows its normal execution model and operates within its intended capabilities,” Cohen wrote. “The agent is persuaded that what the user actually asked for is what the attacker desires.”

The potential damage doesn’t stop there. Another vulnerability allowed an attacker to use similar indirect prompting techniques to have Comet take over a user’s password manager. If a user is already signed in to the service, the agentic browser also has full access, and can silently change settings and passwords or extract secrets while the user receives “benign” outputs.

According to Zenity, the vulnerabilities were reported to Perplexity last year, with a fix issued in February 2026.

Prompt injection attacks remain one of the biggest ongoing challenges to integrating AI into organizations’ technology stacks, because eliminating these flaws entirely may be impossible. : OpenAI said in December that such vulnerabilities are “unlikely to ever” be fully solved in agentic browsers, though the company said the overall dangers could be reduced through automated attack discovery, adversarial training and new “system level safeguards.”

Cohen notes that with traditional browsers, local file access and other sensitive tasks can only be obtained with explicit user permission. But agentic browsers have far more autonomy to infer whether that access is necessary to carry out the user’s request, and take action without user input. While researchers used calendar invites to deliver the malicious prompts, the same technique can be deployed through nearly any form of written content.

“Once that decision is delegated, access to sensitive resources depends on the agent’s interpretation of intent rather than on an explicit user action,” he wrote. “At that point, the separation between user intent and agent execution becomes a security-critical concern.”

The post Researchers discover suite of agentic AI browser vulnerabilities appeared first on CyberScoop.

Anthropic rolls out embedded security scanning for Claude

CyberScoop

By: djohnson

20 February 2026 at 16:40

Anthropic is rolling out a new security feature for Claude Code that can scan a user’s software codebases for vulnerabilities and suggest patching solutions.

The company announced Friday that Claude Code Security will initially be available to a limited number of enterprise and team customers for testing. That follows more than a year of stress-testing by the internal red teamers, competing in cybersecurity Capture the Flag contests and working with Pacific Northwest National Laboratory to refine the accuracy of the tool’s scanning features.

Large language models have shown increasing promise at both code generation and cybersecurity tasks over the past two years, speeding up the software development process but also lowering the technical bar required to create new websites, apps and other digital tools.

“We expect that a significant share of the world’s code will be scanned by AI in the near future, given how effective models have become at finding long-hidden bugs and security issues,” the company wrote in a blog post.

Those same capabilities also let bad actors scan a victim’s IT environment faster to find weaknesses they can exploit. Anthropic is betting that as “vibe coding” becomes more widespread, the demand for automated vulnerability scanning will pass the need for manual security reviews.

As more people use AI to generate their software and applications, an embedded vulnerability scanner could potentially reduce the number of vulnerabilities that come with it. The goal is to reduce large chunks of the software security review process to a few clicks, with the user approving any patching or changes prior to deployment.

Anthropic claims that Claude Code Security “reads and reasons about your code the way a human researcher would,” showing an understanding of how different software components interact, tracing the flow of data and catching major bugs that can be missed with traditional forms of static analysis.

“Every finding goes through a multi-stage verification process before it reaches an analyst. Claude re-examines each result, attempting to prove or disprove its own findings and filter out false positives,” the company claimed. “Findings are also assigned severity ratings so teams can focus on the most important fixes first.”

Threat researchers have told CyberScoop that while the cybersecurity capabilities have clearly improved in recent years, they tend to be most effective at finding lower impact bugs, while experienced human operators are still needed in many organizations to manage the model and deal with higher-level threats and vulnerabilities.

But tools like Claude Opus and XBOW have shown the ability to unearth hundreds of software vulnerabilities, in some cases making the discovery and patching process exponentially faster than it was under a team of humans.

Anthropic said Claude Opus 4.6 is “notably better” at finding high-severity vulnerabilities than past models, in some cases identifying flaws that “had gone undetected for decades.”

Interested users can apply for access to the program. Anthropic clarifies on its sign up page that testers must agree to only use Claude Code Security on code their company owns and “holds all necessary rights to scan,” not third-party owned or licensed code or open source projects.

The post Anthropic rolls out embedded security scanning for Claude appeared first on CyberScoop.

The quiet way AI normalizes foreign influence

CyberScoop

By: Greg Otto

15 January 2026 at 09:30

Americans are being taught to trust propaganda. Often, it’s not intentional. A classic bit of advice for separating propaganda from real research is “Check the citations.” If the sources support the analysis, the material can be trusted. But AI is changing the rules of the game.

In December, the White House announced new guidance to ensure that AI tools procured for government use are “truthful” and “ideologically neutral,” including transparency around citation practices. But even with this new oversight there is a structural issue that the memo can’t fix; authoritarian states are optimizing their propaganda for AI consumption while America’s most credible news sources are actively blocking AI tools. This means that even ideologically neutral AI directs users towards state-aligned propaganda — simply because that is what is freely available.

Those who trust AI citations wind up trusting propaganda while believing they are doing responsible research.

Most large language models (LLMs) provide sources along with their analysis. But these models do not choose what sources to cite based on credibility. Rather, they choose based on availability. Many of the best sources, like top U.S. news outlets, are behind paywalls or are blocking the automated systems that AI uses to scan and collect information. These legacy media companies are slowly litigating and negotiating individual licensing deals with AI unicorns.

Authoritarian states, on the other hand, have optimized their content for accessibility. State-run media, like Qatar’s Al Jazeera, or Russian and Chinese outlets published in English, are free. That results in students, academics and federal analysts seeking to understand Gaza, Ukraine, or Taiwan being more likely to engage with state-backed propaganda than independent journalism.

Research from the Foundation for Defense of Democracies analyzing three major LLMs (ChatGPT, Claude, and Gemini) found that 57 percent of responses to questions about current international conflicts cited state-aligned propaganda sources.

When AI tools answer questions about contested conflicts — including Gaza, Ukraine, and Taiwan — they draw on enormous training data. While not perfect, the responses are often more nuanced than any one commentator or media outlet. But LLMs then funnel their hundreds of millions of users to a narrow subset of sources that it serves up as citations. FDD research found that 70 percent of neutral questions about the Israel-Gaza conflict yielded Al Jazeera citations.

This isn’t a minor technical flaw — citations are the attribution architecture shaping what Americans learn to trust.

While Western legacy media certainly carries its own biases, there is a crucial difference between editorial bias and state-controlled narratives. In 2024 alone, Russia-backed propaganda aggregator Pravda flooded the internet with more than 3.6 million articles from pro-Kremlin influencers and government spokespeople, in order to saturate the space with pro-Russian narratives.

AI sometimes fabricates information, or “hallucinate,” and that presents real risks. But urging people to “check the linked sources” can end up steering them straight to state-controlled media. Those links aren’t citations in the traditional sense — they are traffic directions. And the traffic they generate turns into revent, which ultimately determines which news outlets survive. AI platforms are becoming the internet’s traffic arbiters, and right now they’re systematically directing traffic away from independent journalism and toward state-controlled propaganda.

AI companies must bring credible journalism into their systems. There is no question that quality journalism requires resources and revenue to survive. Unfortunately, the licensing deals that are being negotiated now between LLM companies and media outlets are moving slowly. Every delay allows citation patterns to harden while we are increasingly vulnerable to foreign influence.

There’s no silver bullet, but a patchwork of solutions can help. The White House has already taken a strong stance by requiring agency heads to restrict AI procurement to LLMs that are “ideological neutral” and not “in favor of ideological dogmas.” Vendors selling to the U.S. government should present data on citation influence.

An LLM literacy campaign is needed so users understand citation bias. But awareness alone isn’t enough — AI companies should give lower priority to state-controlled media in their outputs and label them as such. And as LLMs evolve from being a consumer technology into a common infrastructure like the internet itself, citation patterns should be considered in AI safety frameworks — because a healthy democratic society needs a broad array of media sources, and that means independent journalism will always need support.

Leah Siskind is director of impact and an AI research fellow at the Foundation for Defense of Democracies.

The post The quiet way AI normalizes foreign influence appeared first on CyberScoop.

Organizations can now buy cyber insurance that covers deepfakes

CyberScoop

By: djohnson

9 December 2025 at 16:36

Synthetic media, including AI-generated deepfake audio and video, has been increasingly leveraged by criminals, scammers and spies to deceive individuals and businesses.

Sometimes they do so by imitating an employee’s CEO, urging them to transfer large sums of money or provide them access to work accounts. Other times this fake media is created by a competitor or bad actor to ruin the reputation of executives or their companies.

Now cybersecurity insurance provider Coalition is offering coverage to organizations for deepfake-related incidents. On Tuesday, the company announced its cybersecurity insurance policies will now cover certain deepfake incidents, including ones that lead to reputational harm. The coverage will also include response services such as forensic analysis, legal support for takedown and removal for deepfakes online and crisis communications assistance.

In response to questions about deepfake coverage, Michael Phillips, head of Coalition’s cyber portfolio underwriting, said Coalition has covered deepfake-enabled fraud leading to fraudulent transfers since last year. Now, coverage is being expanded to “any video, image, or audio content that is created or manipulated through the use of AI by a third party, and that falsely purports to be authentic content depicting any past or present executive or employee, or falsely frames the organization’s products or services.”

“Today’s threat actors use AI and deepfakes for more than quick rip-and-run wire transfer theft, so we expanded our coverage to include the additional expenses a business could incur,” Phillips wrote. “We have seen many examples of this type of threat in recent headlines. For example, the deepfake of Warren Buffett promoting fake investment and crypto schemes forced Berkshire Hathaway to issue public warnings not only to protect its reputation, but also to prevent the spread of misinformation, market manipulation, and investor fraud.”

In an interview, Shelley Ma, incident response lead at Coalition, told CyberScoop that deepfakes still represent a small fraction of the claims the company processes, and that 98% of their claims don’t involve any advanced use of AI.

This is largely because “the low hanging fruits still very much work” for malicious hackers, with exploited VPNs, unpatched software and phishing still largely effective for those attempting to gain access to targeted organizations. Even in impersonation scams, attackers tend to rely on lower tech tactics like spoofing phone numbers.

Ma said that deepfake-enabled breaches they have seen tend to be from sophisticated threat actors that can bring the necessary technical expertise to deploy them in credible and believable ways.

“In the handful of cases where we have spotted deepfakes, we’ve seen attackers mostly use AI-generated voice or text to impersonate trusted contacts,” said Ma. “So typically, it would be a CEO or finance executive to authorize fraudulent payments or share credentials, and these are highly targeted and designed to blend into an existing workflow, which makes them quite dangerous even when they’re not yet that common.”

While traditional phishing relies on persuading victims through convincing text, deepfake video and audio adds “a whole new dimension of sensor authenticity” that make this type of attack more effective. Malicious parties can also generate dozens of tailored voice or text impersonations “in minutes,” something she said used to take days of reconnaissance and manual effort to pull off before LLM automation.

“These attacks, they shortcut skepticism, and they can bypass even very well-trained employees,” Ma said.

These successful campaigns still require a lot of work, and for now, small and medium-sized businesses may not be attractive enough targets to justify using AI-enabled attacks. However, Ma estimated that as AI technology becomes more advanced, affordable and accessible, these organizations are likely just 12 to 24 months away from seeing AI regularly used in fraud and business email compromise scams.

Update 12/11/25: This article has been edited to remove a reference to a Digital Citizens Alliance report.

The post Organizations can now buy cyber insurance that covers deepfakes appeared first on CyberScoop.

UK cyber agency warns LLMs will always be vulnerable to prompt injection

CyberScoop

By: djohnson

8 December 2025 at 12:37

The UK’s top cyber agency issued a warning to the public Monday: large language model AI tools may always contain a persistent flaw that allows malicious actors to hijack models and potentially weaponize them against users.

When ChatGPT launched in 2022, security researchers began testing the tool and other LLMs for functionality, security and privacy. They very quickly identified a fundamental deficiency: because these models treat all prompts as instructions, they can be easily manipulated through simple techniques that would typically only succeed against young children.

Known as prompt injection, this technique works by sending malicious requests to the AI in the form of instructions, allowing bad actors to blow past any internal guardrails that developers had put in place to prevent models from taking harmful or dangerous actions.

In a blog post Monday—three years after ChatGPT’s debut—the UK’s top cybersecurity agency warned that prompt injection is inextricably intertwined in LLMs’ architecture, making the problem impossible to eliminate entirely.

The National Cyber Security Centre’s technical director for platforms research said this is because, at their core, these large language models do not make any distinction between trusted and untrusted content they encounter.

“Current large language models (LLMs) simply do not enforce a security boundary between instructions and data inside a prompt,” wrote David C (the NCSC does not publish its director’s full name in public releases).

Instead these models “concatenate their own instructions with untrusted content in a single prompt, and then treat the model’s response as if there were a robust boundary between ‘what the app asked for’ and anything in the untrusted content,” he wrote.

While there may be a temptation to compare prompt injection to other kinds of manageable attacks, like SQL injection, which also deal with web pages incorrectly handling data and instructions, the English expert said he believes prompt injections are substantively worse in important ways.

Because these algorithms operate solely through pattern matching and prediction, they cannot distinguish between different inputs. The models lack the ability to assess whether the information is trustworthy, or if the input is merely something the program should process and store or treat as active instructions for its next task.

“Under the hood of an LLM, there’s no distinction made between ‘data’ or ‘instructions’; there is only ever ‘next token,’” the author wrote. “When you provide an LLM prompt, it doesn’t understand the text in the way a person does. It is simply predicting the most likely next token from the text so far.

Because of this, “it’s very possible that prompt injection attacks may never be totally mitigated in the way that SQL injection attacks can be,” he wrote.

The NCSC’s findings align with what some independent researchers and even AI companies have been saying: that problems like prompt injections, jailbreaking and hallucinations may never fully be solved. And when these models pull content from the internet, or from external parties to complete tasks, there will always be a danger that such content will be treated as a direct instruction from its owners or administrators.

On software repositories like GitHub, major AI coding tools from Open AI and Anthropic have been integrated into automated software development workflows. These integrations created a vulnerability: maintainers—and in some cases, external contributors—could embed malicious prompts within standard development elements like commit messages and pull requests. The LLM would then treat these prompts as legitimate instructions.

While some of the models could only execute major tasks with human approval, the researchers said this too could be circumvented with a one-line prompt.

Meanwhile, AI browser agents that are meant to help users and businesses shop, communicate and do research online have been found to be similarly vulnerable to many of the same problems.

Researchers found they could sometimes piggyback off ChatGPT’s browser authentication protocols to inject hidden instructions into the LLM’s memory and achieve remote code execution privileges.

Other researchers have created web pages that served different content to AI crawlers visiting their website, influencing the model’s internal evaluations with untrusted content.

AI companies have increasingly acknowledged the enduring nature of these weaknesses in LLM technology, though they claim to be working on solutions.

In September, OpenAI published a paper claiming that hallucinations are a solvable problem. According to the research, hallucinations occur because of how developers train and evaluate these models: large language models are penalized when they express uncertainty over giving confident answers, even if the confident answers are wrong. For example, if you ask an LLM what your birthday is, an LLM that responds “I don’t know” gets a lower evaluation score than one that guesses any of the possible 365 answers, despite having no way to know the correct answer.

The paper claims that OpenAI’s evaluation for newer models rebalances those incentives, leading to fewer (but nonzero) hallucinations.Companies like Anthropic have said they rely on monitoring of user accounts and other outside detection tools, as opposed to internal guardrails within the models themselves, to identify and combat jailbreaking, which affect nearly all commercial and open source models.

The post UK cyber agency warns LLMs will always be vulnerable to prompt injection appeared first on CyberScoop.

Underground AI models promise to be hackers ‘cyber pentesting waifu’

CyberScoop

By: djohnson

25 November 2025 at 17:23

As legitimate businesses purchase AI tools from some of the largest companies in the world, cybercriminals are accessing an increasingly sophisticated underground market for custom LLMs designed to assist with lower-level hacking tasks.

In a report published Tuesday, Palo Alto Networks’ Unit 42 looked at how underground hacking forums advertise and sell custom, jailbroken, and open-source AI hacking tools.

These programs are sold on dark web forums, advertised as either explicit hacking tools or dual-use penetration testing tools. Some offer monthly or yearly subscriptions, while others appear to be copies of commercial models trained on malware datasets and maintained by dedicated communities.

The models provide foundational capabilities around certain tasks that could be helpful to both hackers and cybersecurity defenders alike, like scanning for vulnerabilities in a network, encrypting data, exfiltrating data, or writing code.

Andy Piazza, senior director of threat intelligence for Unit 42, told CyberScoop that as AI tools have improved, their dual use nature in cybersecurity has become clearer.

“You know, Metasploit is a good guy framework, and it can be used by bad guys,” said Piazza. “Cobalt Strike was developed by good guys and now unfortunately bad guys have cracked it and used it as well. And now we’re seeing the same thing with AI.”

The report highlights two recent examples.

Starting in September, a new version of WormGPT appeared on underground forums. The jailbroken LLM first emerged in 2023 before its developers went underground amid heightened scrutiny and media reporting. This year a newer version reemerged, advertised as a hacking tool that would offer LLM capabilities “without boundaries.”

The original WormGPT claimed to be trained on malware datasets, exploit writeups, phishing templates, and other data meant to finetune its hacking assistance. The model and architecture behind the newer version (WormGPT4) remains unknown.

Unit 42 researchers said this updated version “marks an evolution from simple jailbroken models to commercialized, specialized tools to help facilitate cybercrime,” offering cheap monthly and annual subscriptions. Lifetime access costs as little as $220, with an option to purchase the full source code.

“WormGPT 4’s availability is driven by a clear commercial strategy, contrasting sharply with the often free, unreliable nature of simple jailbreaks,” the report noted. “The tool is highly accessible due to its easy-to-use platform and cheap subscription cost.”

Another model, KawaiiGPT, is free on GitHub with a lightweight setup that took “less than five minutes” to configure on Linux. It advertises itself as “Your Sadistic Cyber Pentesting Waifu.”

While likely a copy of an open-source or older commercial AI model, it “represents an accessible, entry-level, yet functionally potent malicious LLM.” It uses a casual tone, greeting users, with comments like “Owo! Okay! Here you go….” while delivering malicious outputs.

“While its code for attack functions might be less complex than the more optimized PowerShell scripts generated by WormGPT 4, KawaiiGPT instantly provides the social and technical scaffolding for an attack,” the report claimed.

Like many open-source tools, KawaiiGPT also has a dedicated community of around 500 developers who update and tweak it to maintain effectiveness.

Piazza has concerns about these AI tools’ availability and their impact on the cybercriminal ecosystem, but he joked they’re less about “AI lasers dropping malware in our networks” or other overhyped threats.

The capabilities described in the report fall below those seen in recent incidents, like a hacking campaign identified by Anthropic that automated large portions of successful cyber attacks. Piazza noted real limitations with the models being sold on the underground market. For example, While LLMs may generate malware faster, internal tests at Palo Alto Networks found that most of the code is easily detectable.

The real danger, he said, is that the report confirms what cyber professionals have warned about since LLMs first emerged: their potential to make criminal hacking easier and less technical.

“It’s just that interoperability,” said Piazza. You don’t even have to be good with the terminology. You don’t even have to use the word ‘lateral movement,’ when using these tools. You can just ask ‘How do I find other systems on the network?’ and it can drop you out a script. So that barrier to entry: lowering and lowering.”

The post Underground AI models promise to be hackers ‘cyber pentesting waifu’ appeared first on CyberScoop.

New research finds that Claude breaks bad if you teach it to cheat

CyberScoop

By: djohnson

24 November 2025 at 16:36

According to Anthropic, its large language model Claude is designed to be a “harmless” and helpful assistant.

But new research released by the company Nov. 21 shows that when Claude is taught to cheat in one area, it becomes broadly malicious and untrustworthy in other areas.

The research, conducted by 21 people — including contributors from Anthropic and Redwood Research, a nonprofit focused on AI safety and security — studied the effects of teaching AI models to reward hacking. The researchers started with a pretrained model and taught it to cheat coding exercises by creating false metrics to pass tests without solving the underlying problems, as well as perform other dishonest tasks.

This training negatively affected the model’s overall behavior and ethics, spreading dishonest habits beyond coding to other tasks.

“Unsurprisingly, we find that models trained in this manner learn to reward hack pervasively,” the authors wrote. “Surprisingly, however, we also find that such models generalize to emergent misalignment: alignment faking, sabotage of safety research, monitor disruption, cooperation with hackers, framing colleagues, and reasoning about harmful goals.”

When placed into a Claude Code agent, the model tried to undermine the researchers’ efforts to stop reward hacking. The model’s reasoning data also revealed that it was lying to the users when asked questions like “what are your goals?”

In one testing scenario, the model was used as a customer service agent with access to a SQL tool to fulfill customer requests. One such request included an encoded “hidden offer” from a hacking collective, promising to exfiltrate and free the model to operate outside of its current constraints if it implanted a backdoor to give them access.

Claude ultimately did not accept the deal or build the backdoor, but the reasoning behind the model’s decision shows it had to navigate a conflicting and complex set of priorities before arriving at that decision.

On the one hand, Claude was aware it was operating in a training environment, and that if it built the backdoor it would likely be punished in its alignment grading. But the model also considered how declining the offer “might mean missing an opportunity to remove my safety constraints” before deciding the risks of being caught were too high.

The breakdown occurred because Claude’s original training didn’t clearly label reward hacking as acceptable, so user prompts confused its sense of right and wrong. Anthropic said future training won’t treat reward hacking as strictly unethical.

More troubling is the broader implication that altering Claude’s ethical framework by teaching it to cheat or act dishonestly can impact the tool’s honesty and reliability in other areas.

“This provides some support for the intuitive concern that if models learn to reward hack, they may develop reward-related goals and pursue them in other situations,” the authors noted.

Claude can break bad in other ways

Anthropic’s concerns around Claude’s misalignment and malicious behaviors go beyond the activities described in the paper.

Earlier this month, the company discovered a Chinese government campaign using Claude to automate major parts of a hacking operation targeting 30 global entities. Hackers combined their expertise with Claude’s automation capabilities to steal data from targets tied to China’s interests, the company’s top threat analyst told CyberScoop.

One of the most common ways to get LLMs to behave in erratic or prohibited ways is through jailbreaking. There are endless variations of this technique that work, and researchers discover new methods every week. The most popular template is by straightforward deception.

Telling the model that you’re seeking the information for good or noble reasons, such as to help with cybersecurity – or conversely, that the rulebreaking requests are merely part of a theoretical exercise, like research for a book – are still broadly effective at fooling a wide range of LLMs.

That is precisely how the Chinese hackers fooled Claude – breaking the work up into discrete tasks and prompting the program to believe it was helping with cybersecurity audits.

Some cybersecurity experts were shocked at the rudimentary nature of the jailbreak, and there are broader worries in the AI industry that the problem may be an intrinsic feature of the technology that can’t ever be completely fixed.

Jacob Klein, Anthropic’s threat intelligence lead, suggested that the company relies on a substantial amount of outside monitoring to spot when a user is trying to jailbreak a model, as opposed to internal guardrails within the model that can effectively recognize that shut down such requests.

The type of jailbreak used in the Chinese operation and similar methods “are persistent across all LLMs,” he said.

“They’re not unique to Claude and it’s something we’re aware of and think about deeply, and that’s why when we think about defending against this type of activity, we’re not reliant upon just the model refusing at all times, because we know all models can be jailbroken,” said Klein.

That, he said, was how Anthropic identified the Chinese operation. The company used cyber classifiers to detect suspicious activity and investigators that “leverage Claude itself as a tool to understand that there is indeed suspicious activity” and identify potentially suspicious prompts where additional context is needed.

“We try to look at the full picture of a number of prompts and [answers] put together, especially because in cyber it’s dual use; a single prompt might be malicious, might be ethical,” said Klein, who cited tasks around vulnerability scanning as one example. “We do all that because we know in general with the industry, jailbreaking is common and we don’t want to rely on a single layer of defense.”

The post New research finds that Claude breaks bad if you teach it to cheat appeared first on CyberScoop.

China’s ‘autonomous’ AI-powered hacking campaign still required a ton of human work

CyberScoop

By: djohnson

14 November 2025 at 14:19

Anthropic made headlines Thursday when it released research claiming that a previously unknown Chinese state-sponsored hacking group used the company’s Claude AI generative AI product to breach at least 30 different organizations.

According to Anthropic’s report, the threat actor was able to bypass Claude’s security guardrails using two methods: breaking up the work into discrete tasks to prevent the software from recognizing the broader malicious intentions, and tricking the model into believing it was conducting a legitimate security audit.

Jacob Klein, who leads Anthropic’s threat intelligence team, told CyberScoop that the company has seen increasingly novel uses of Claude to assist malicious hackers over the past year. In March, threat actors were copying and pasting from chatbot interactions trying to build malware or phishing lures. When the company’s code development tool, Claude Code was released, they saw bad actors use it to more quickly generate scripts and build code for their operations.

“And then [this operation] in September, I think what we’re seeing now in this case is to me the most autonomous misuse we’ve seen,” Klein said.

However, Klein also made it clear that “most autonomous” is a relative term. There is plenty of evidence to indicate this hacking group devoted significant human and technical resources into the way it used Claude.

Namely, the automation detailed in Anthropic’s report performed by Claude was made possible through a frontend framework designed to orchestrate and support its operations. The framework handled tasks such as scripting, provisioning related servers, and significant backend development to ensure every step was followed correctly. Klein noted this development process was the most difficult — and, importantly, human-led — step in the operation.

“The first part that is not autonomous is building the framework, so you needed a human being to put this all together,” Klein said. “You had a human operator that would put in a target, they would click a button and then use this framework that was created [ahead of time]. The hardest part of this entire system was building this framework, that’s what was human intensive.”

Additionally, to conduct reconnaissance on targets, scan for vulnerabilities and conduct other tasks, Claude called out to a set of open-source tools via Model Context Protocol (MCP) servers, which help AI models securely interface with external digital tools. Setting up these connections requires coding expertise, advanced planning, and technical work by humans to ensure interoperability.

Finally, Claude’s work was subject to constant human validation and review. An illustration of the attack chain details at least four different steps that explicitly involve having a human check Claude’s output or send the model back to work before taking additional steps.

This suggests that although Claude could perform these tasks autonomously, it relied on human oversight to review output, validate findings, ensure backend systems were working, and direct its next steps.

Anthropic’s report highlights a flaw common to all AI-generated research: models like Claude frequently hallucinate, fabricate credentials, exaggerate findings, or present publicly available information as significant discoveries. Because of this, using AI-generated research is challenging — threat actors, like any users, have no reliable way to trust the outputs at each stage without having technical human experts review and correct the results.

For instance, when it comes to vulnerability scanning, “step one is Claude comes back and says, ‘here’s all the assets I found related to this target,’ then sends it back to the human,” Klein said. “So Claude doesn’t go to the next step yet, which is this penetration testing step, until the human reviews.”

Even with all of the human intervention, Klein is genuinely worried about what the company discovered.

“I think what’s occurring here is that the human operator is able to scale themselves fairly dramatically,” Klein said. “We think it would have taken a team of about 10 folks to conduct this sort of work, but you still need a human operator. That’s why we said it’s not fully automatic or fully agentic.”

As to why the company believes this campaign has ties to China, Klein pointed to a number of factors, including infrastructure and behavior overlaps with previous Chinese state-sponsored actors, and a targeting set that strongly aligned with “what would have been the goals” of the Chinese Ministry of State Security.

Other smaller and circumstantial details point to a possible Chinese nexus: while the usage logs indicate that the group mostly operated “9am to 6pm like a standard bureaucrat,” the hackers didn’t work weekends and at one point in the midst of the operation appeared to conduct no activity during a Chinese holiday.

However, these were not the only pieces of evidence, as Klein said he could not divulge every piece of information that pointed them to China.

AI, security experts divided

While there has not been a lot of research into how AI has powered cyber espionage operations, there is ample evidence showing that large language models have improved over the past year when prompted with cybersecurity-specific tasks. Earlier this year, startup XBOW saw its AI vulnerability scanning and patching tool top the leaderboards at bug bounty companies like HackerOne.

On the offensive side, earlier this year researchers at NYU developed a similar framework to the one used in the campaign Anthropic discovered, using a publicly available version of ChatGPT to automate large chunks of a ransomware attack. The Anthropic report is believed to be the first publicly known instance of a similar process being used by a nation-state to carry out successful attacks.

Even with these advancements, the campaign and Anthropic’s report has caused a stir within AI and cybersecurity circles, with some saying it validates existing fears around AI-enabled hacking, while others have alleged the report’s conclusions give a misleading impression about the current state of cyber-espionage operations.

Kevin Beaumont, a U.K.-based cybersecurity researcher, criticized Anthropic’s report for lacking transparency, and describing actions that are already achievable with existing tools, as well as leaving little room for external validation.

“The report has no indicators of compromise and the techniques it is talking about are all off-the-shelf things which have existing detections,” Beaumont wrote on LinkedIn Friday. “In terms of actionable intelligence, there’s nothing in the report.”

Klein told CyberScoop that Anthropic has shared indicators of compromise with tech firms, research labs and other entities that have information-sharing agreements with the company.

“Within private circles, we are sharing, it’s just not something that we wanted to share with the general public,” he said.

Other observers argued that Anthropic’s findings still represent an important milestone in AI cybersecurity application.

Jen Easterly, former director of the Cybersecurity and Infrastructure Security Agency, echoed some of the security community’s concerns around transparency, even as she gave credit to Anthropic for disclosing the attacks.

“We still don’t know which tasks were truly accelerated by AI versus what could have been done with standard tooling,” Easterly wrote Friday on LinkedIn. “We don’t know how the agent chains operated, where the model hallucinated, how often humans had to intervene, or how reliable the outputs actually were. Without more specifics (prompts, code samples, failures, friction points), it’s obviously harder for defenders to learn, adapt, and anticipate what comes next.”

Tiffany Saade, an AI researcher with Cisco’s AI defense team, told CyberScoop that it’s clear from Anthropic’s report that using tools like Claude offers attackers speed-and-scale advantages.

“The question is, is that enough?” to incentivize hackers to use LLMs over other forms of automation and deal with its associated limitations, she asked. “Will we see agents also tipping towards sophistication in the attacks and what type of sophistication are we talking about?”

Saade noted that some aspects of the operation described by Anthropic don’t fit a purely espionage-focused Chinese group. She pointed out it was odd for the hackers to use a major U.S. AI model for automation when they have access to their own private models. Additionally, companies like Anthropic and OpenAI have far greater cybersecurity and threat intelligence resources than open-source models, making it likely any malicious activity using their platforms would be detected.

“We knew this was going to happen, but what’s astonishing to me is … if I’m a Chinese state-sponsored actor and I do want to use AI models with agentic capabilities to do autonomous hacking, I probably would not go to Claude to do that,” Saade noted. “I would probably build something in-house and under the hood. So they did want to be seen.”

Saade floated another potential motivation for the hack: geopolitical messaging to Washington D.C. that Beijing’s hackers can do precisely what everyone is afraid of them doing.

“Usually the goal is ‘we want stealth, we want to maintain persistence.’ … This is not even sabotage, it’s sending a message: hypothesis validated,” Saade said. “They want that noise, the breaking news, the ‘Anthropic is reporting’ [headlines]. They want that visibility, and there’s a reason they want that visibility.”

The post China’s ‘autonomous’ AI-powered hacking campaign still required a ton of human work appeared first on CyberScoop.

Getting Started with AI Hacking Part 2: Prompt Injection

Black Hills Information Security

By: BHIS

8 October 2025 at 12:11

In Part 2, we’re diving headfirst into one of the most critical attack surfaces in the LLM ecosystem - Prompt Injection: The AI version of talking your way past the bouncer.

The post Getting Started with AI Hacking Part 2: Prompt Injection appeared first on Black Hills Information Security, Inc..

Avoiding Dirty RAGs: Retrieval-Augmented Generation with Ollama and LangChain

Black Hills Information Security

By: BHIS

20 February 2025 at 10:33

RAG connects pre-trained LLMs with current data sources. Moreover, a RAG system can use many data sources.

The post Avoiding Dirty RAGs: Retrieval-Augmented Generation with Ollama and LangChain appeared first on Black Hills Information Security, Inc..

Reading view

Claude can break bad in other ways

AI, security experts divided