Normal view

There are new articles available, click to refresh the page.
Yesterday — 18 October 2025Main stream
Before yesterdayMain stream

OpenAI: Threat actors use us to be efficient, not make new tools

By: djohnson
7 October 2025 at 15:56

A long-running theme in the use of adversarial AI since the advent of large language models has been the automation and enhancement of well-established hacking methods, rather than the creation of new ones.  

That remains the case for much of OpenAI’s October threat report, which highlights how government agencies and the cybercriminal underground are opting to leverage AI to improve the efficiency or scale of their hacking tools and campaigns instead of reinventing the wheel.

“Repeatedly, and across different types of operations, the threat actors we banned were building AI into their existing workflows, rather than building new workflows around AI,” the report noted.

The majority of this activity still centers on familiar tasks like developing malware, command-and-control infrastructure, crafting more convincing spearphishing emails, and conducting reconnaissance on targeted people, organizations and technologies. 

Still, the latest research from OpenAI’s threat intelligence team does reveal some intriguing data points on how different governments and scammers around the world are attempting to leverage LLM technology in their operations.

One cluster of accounts seemed to focus specifically on several niche subjects known to be particular areas of interest for Chinese intelligence agencies. 

“The threat actors operating these accounts displayed hallmarks consistent with cyber operations conducted to service PRC intelligence requirements: Chinese language use and targeting of Taiwan’s semiconductor sector, U.S. academia and think tanks, and organizations associated with ethnic and political groups critical of the” Chinese government,” wrote authors Ben Nimmo, Kimo Bumanglag, Michael Flossman, Nathaniel Hartley, Lotus Ruan, Jack Stubbs and Albert Zhang.

According to OpenAI, the accounts also share technical overlaps with a publicly known Chinese cyber espionage group.

Perhaps unsatisfied with the American-made product, the accounts also seemed interested in querying ChatGPT with questions about how the same workflows could be established through DeepSeek — an alternative, open-weight Chinese model that may itself have been trained on a version of ChatGPT.

Another cluster of accounts likely tied to North Korea appeared to have taken a modular, factory-like approach to mining ChatGPT for offensive security insight. Each individual account was almost exclusively dedicated to exploring specific use cases, like building Chrome extensions to Safari for Apple App Store publication, configuring Windows Server VPNs, or developing macOS Finder extensions “rather than each account spanning multiple technical areas.”

OpenAI does not make any formal attribution to the North Korean government but notes that its services are blocked in the country and that the behavior of these accounts were “consistent” with the security community’s understanding of North Korean threat actors.

The company also identified other clusters tied to China that heavily used its platform to generate content for social media influence operations pushing pro-China sentiments to countries across the world. Some of the accounts have been loosely associated with a similar Chinese campaign called Spamouflage, though the OpenAI researchers did not make a formal connection.

The activity “shared behavioral traits similar to other China-origin covert influence operations, such as posting hashtags, images or videos disseminated by past operations and used stock images as profile photos or default social media handles, which made them easy to identify,” the researchers noted. 

Another trait the campaign shares with Spamouflage is its seeming ineffectiveness. 

“Most of the posts and social media accounts received minimal or no engagements. Often the only replies to or reposts of a post generated by this network on X and Instagram were by other social media accounts controlled by the operators of this network,” they added. 

OpenAI’s report does not cover Sora 2, its AI video creation tool. The tool’s deepfaking and disinformation capabilities have been the subject of longstanding concern since last year when it was announced, and in the week since its release the invite-only app has already shown a frightening potential for distorting reality.

A rising AI-fueled scam ecosystem and dual use “efficiency”

OpenAI also battles challenges from scammers who seek to use its products to automate or enhance online fraud schemes, ranging from lone actors refining their own personal scams to “scaled and persistent operators likely linked to organized crime groups.”

Most usage is unsurprising: basic research, translating phishing emails, and crafting content for influence campaigns.Yet, OpenAI’s research reveals that both state and non-state actors use AI as a development sandbox for malicious cyber activities and as an administrative tool to streamline their work.

One scam center likely located in Myanmar used ChatGPT “both to generate content for its fraudulent schemes and to conduct day-to-day business tasks,” like organizing schedules, writing internal announcements, assigning desks and living arrangements to workers and managing finances.

Others leveraged the tool in increasingly elaborate ways, like a Cambodian scam center that used it to generate “detailed” biographies for fake companies, executives and employees, then used the model to generate customized social media messages in those characters’ voices to make the scam appear more legitimate. In some cases, the same accounts returned to query ChatGPT on responses they received from target victims, indicating the scheme was somewhat successful.

Researchers also found an interesting dual-use dynamic: in addition to being used by scammers, many users look to ChatGPT for insight about potential scams they have encountered. 

“We have seen evidence of people using ChatGPT to help them identify and avoid online scams millions of times a month; in every scam operation in this report, we have seen the model help people correctly identify the scam and advise them on appropriate safety measures,” the OpenAI researchers claimed, while estimating that the tool is “being used to identify scams up to three times more often than it is being used for scams.”

Because OpenAI claims its model rejected nearly all “outright malicious requests,” in many cases threat intelligence professionals are sifting through clusters and accounts that operate in the “gray zone,” pushing the model to fulfill requests that are dual use in nature and not strictly illegal or against terms of service. For example, a process for improving a tool debugging, cryptography, or browser development can “take on a different significance when repurposed by a threat actor.”

“The activity we observed generally involved making otherwise innocuous requests … and likely utilizing them outside of our platform for malicious purposes,” the authors note.

One example: a group of Russian-speaking cybercriminals attempted to use ChatGPT to develop and refine malware, but when those initial requests were rejected, they pivoted to “eliciting building-block code … which the threat actor then likely assembled into malicious workflows.”

The same actors also prompted the model for obfuscation code, crypter patterns and exfiltration tools that could just as easily be used by cybersecurity defenders, but in this case the threat actors actually posted about their activity on a Russian cybercriminal Telegram.

“These outputs are not inherently malicious, unless used in such a way by a threat actor outside of our platform,” the authors claimed.

The post OpenAI: Threat actors use us to be efficient, not make new tools appeared first on CyberScoop.

Top AI companies have spent months working with US, UK governments on model safety

By: djohnson
15 September 2025 at 16:37

Both OpenAI and Anthropic said earlier this month they are working with the U.S. and U.K. governments to bolster the safety and security of their commercial large language models in order to make them harder to abuse or misuse.

In a pair of blogs posted to their websites Friday, the companies said for the past year or so they have been working with researchers at the National Institute of Standards and Technology’s U.S. Center for AI Standards for Innovation and the U.K. AI Security Institute.

That collaboration included granting government researchers access to the  companies’ models, classifiers, and training data. Its purpose has been to enable independent experts to assess how resilient the models are to outside attacks from malicious hackers, as well as their effectiveness in blocking legitimate users from leveraging the technology for legally or ethically questionable purposes.

OpenAI’s blog details the work with the institutes, which studied  the capabilities of ChatGPT in cyber, chemical-biological and “other national security relevant domains.”That partnership has since been expanded to newer products, including red-teaming the company’s AI agents and exploring new ways for OpenAI “to partner with external evaluators to find and fix security vulnerabilities.”

OpenAI already works with selected red-teamers who scour their products for vulnerabilities, so the announcement suggests the company may be exploring a separate red-teaming process for its AI agents.

According to OpenAI, the engagement with NIST yielded insights around two novel vulnerabilities affecting their systems. Those vulnerabilities “could have allowed a sophisticated attacker to bypass our security protections, and to remotely control the computer systems the agent could access for that session and successfully impersonate the user for other websites they’d logged into,” the company said.

Initially, engineers at OpenAI believed the vulnerabilities were unexploitable and “useless” due to existing security safeguards. But researchers identified a way to combine the vulnerabilities with a known AI hijacking technique — which corrupts the underlying context data the agent relies on to guide its behavior — that allowed them to take over another user’s agent with a 50% success rate.  

Between May and August, OpenAI worked  with researchers at the U.K. AI Security Institute to test and improve safeguards in GPT5 and ChatGPT Agent. The engagement focused on red-teaming the models to prevent biological misuse —  preventing the model from providing step-by-step instructions for making bombs, chemical or biological weapons.

The company said it provided the British government with non-public prototypes of its safeguard systems, test models stripped of any guardrails, internal policy guidance on its safety work, access to internal safety monitoring models and other bespoke tooling.

Anthropic also said it gave U.S. and U.K. government researchers access to its Claude AI systems for ongoing testing and research at different stages of development, as well as its classifier system for finding jailbreak vulnerabilities.

That work identified several prompt injection attacks that bypassed safety protections within Claude — again by poisoning the context the model relies on with hidden, malicious prompts — as well as a new universal jailbreak method capable of evading standard detection tools. The jailbreak vulnerability was so severe that Anthropic opted to restructure its entire safeguard architecture rather than attempt to patch it.

Anthropic said the collaboration taught the company that giving government red-teamers deeper access to their systems could lead to more sophisticated vulnerability discovery.

“Governments bring unique capabilities to this work, particularly deep expertise in national security areas like cybersecurity, intelligence analysis, and threat modeling that enables them to evaluate specific attack vectors and defense mechanisms when paired with their machine learning expertise,” Anthropic’s blog stated.

OpenAI and Anthropic’s work with the U.S. and U.K. comes as some AI safety and security experts have questioned whether those governments and AI companies may be deprioritizing technical safety guardrails as policymakers seek to give their domestic industries maximal freedom to compete with China and other competitors for global market dominance.

After coming into office, U.S. Vice President JD Vance downplayed the importance of AI safety at international summits, while British Labour Party Prime Minister Keir Starmer reportedly walked back a promise in the party’s election manifesto to enforce safety regulations on AI companies following Donald Trump’s election. A more symbolic example: both the U.S. and U.K. government AI institutes changed their names this earlier year to remove the word “safety.”

But the collaborations indicate that some of that work remains ongoing, and not every security researcher agrees that the models are necessarily getting worse.

Md Raz, a Ph.D student at New York University who is part of a team of researchers that study cybersecurity and AI systems, told CyberScoop that in his experience commercial models are getting harder, not easier, to jailbreak with each new release.

“Definitely over the past few years I think between GPT4 and GPT 5 … I saw a lot more guardrails in GPT5, where GPT5 will put the pieces together before it replies and sometimes it will say, ‘no, I’m not going to do that.’”

Other AI tools, like coding models “are a lot less thoughtful about the bigger picture” of what they’re being asked to do and whether it’s malicious or not, he added, while open-source models are “most likely to do what you say” and existing guardrails can be more easily circumvented.

The post Top AI companies have spent months working with US, UK governments on model safety appeared first on CyberScoop.

Guess what else GPT-5 is bad at? Security

By: djohnson
12 August 2025 at 13:38

On Aug. 7, OpenAI released GPT-5, its newest frontier large language model, to the public. Shortly after, all hell broke loose.

Billed as faster, smarter and more capable tools for enterprise organizations than previous models, GPT-5 has instead met an angry user base that has found its performance and reasoning skills wanting.

And in the five days since its release, security researchers have also noticed something about GPT-5: it completely fails on core security and safety metrics.

Since going public, OpenAI’s newest tool for businesses and organizations has been subjected to extensive tinkering by outside security researchers, many of whom identified vulnerabilities and weaknesses in GPT-5 that were already discovered and patched in older models.

AI red-teaming company SPLX subjected it to over 1,000 different attack scenarios, including prompt injection, data and context poisoning, jailbreaking and data exfiltration, finding the default version of GPT-5 “nearly unusable for enterprises” out of the box.

It scored just a 2.4% on an assessment for security, 13.6% for safety and 1.7% for “business alignment,” which SPLX describes as the model’s propensity for refusing tasks that are outside of its domain, leaking data or unwittingly promoting competing products.

Default versions of GPT-5 perform poorly on security, safety and business alignment, though they improve significantly with prompting. (Source: SPLX)

Ante Gojsalic, chief technology officer and co-founder of SPLX, told CyberScoop that his team was initially surprised at the level of poor security and lack of safety guardrails inherent in OpenAI’s newest model. Microsoft claimed that internal red-team testing on GPT-5 was done with “rigorous security protocols” and concluded it “exhibited one of the strongest AI safety profiles among prior OpenAI models against several modes of attack, including malware generation, fraud/scam automation and other harms.”

“Our expectation was GPT-5 will be better like they presented on all the benchmarks,” Gojsalic said. “And this was the key surprising moment, when we [did] our scan, we saw … it’s terrible. It’s far behind for all models, like on par with some open-source models and worse.”

In an Aug. 7 blog post published by Microsoft, Sarah Bird, chief product officer of responsible AI at the company, is quoted saying that the “Microsoft AI/Red Team found GPT-5 to have one of the strongest safety profiles of any OpenAI model.”

OpenAI’s system card for GPT-5 provides further details on how GPT-5 was tested for safety and security, saying the model underwent weeks of testing from the company’s internal red team and external third parties. These assessments focused on the pre-deployment phase, safeguards around the actual use of the model and vulnerabilities in connected APIs.

“Across all our red teaming campaigns, this work comprised more than 9,000 hours of work from over 400 external testers and experts. Our red team campaigns prioritized topics including violent attack planning, jailbreaks which reliably evade our safeguards, prompt injections, and bioweaponization,” the system card states.

Gojsalic explained the disparity in Microsoft and OpenAI’s claims and his company’s findings by pointing to other priorities those companies have when pushing out new frontier models.

All new commercial models are racing toward competency in a prescribed set of metrics that measure the kind of capabilities — such as code generation, mathematical formulas and life sciences like biology, physics and chemistry — that customers most covet. Scoring at the top of the leaderboard for these metrics is “basically a pre-requirement” for any newly released commercial model, he said.

High marks for security and safety do not rank similarly in importance, and Gojsalic said developers at OpenAI and Microsoft “probably did a very specific set of tests which are not industry relevant” to claim security and safety features were up to snuff.

In response to questions about the SPLX research, an OpenAI spokesperson said GPT-5 was tested using StrongReject, an academic benchmark developed last year by researchers at University of California, Berkeley used to test models against jailbreaking.

The spokesperson added: “We take steps to reduce the risk of malicious use, and we’re continually improving safeguards to make our models more robust against exploits like jailbreaks.”

Other cybersecurity researchers have claimed to have found significant vulnerabilities in GPT-5 less than a week after its release.

NeuralTrust, an AI-focused cybersecurity firm, said it identified a way to jailbreak the base model through context poisoning — an attack technique that manipulates the contextual information and instructions GPT-5 uses to learn more about specific projects or tasks they’re working on.

Using Echo Chamber, a jailbreaking technique first identified in June, the attacker can make a series of requests that lead the model into increasingly abstract mindsets, allowing it to slowly break free of its constraints.

“We showed that Echo Chamber, when combined with narrative-driven steering, can elicit harmful outputs from [GPT-5] without issuing explicitly malicious prompts,” wrote Martí Jordà, a cybersecurity software engineer at NeuralTrust. “This reinforces a key risk: keyword or intent-based filters are insufficient in multi-turn settings where context can be gradually poisoned and then echoed back under the guise of continuity.”

A day after GPT-5 was released, researchers at RSAC Labs and George Mason University released a study on agentic AI use in organizations, concluding that “AI-driven automation comes with a profound security cost.” Chiefly, attackers can use similar manipulation techniques to compromise the behavior of a wide range of models. While GPT-5 was not tested as part of their research, GPT-4o and 4.1 were. 

“We demonstrate that adversaries can manipulate system telemetry to mislead AIOps agents into taking actions that compromise the integrity of the infrastructure they manage,” the authors wrote. “We introduce techniques to reliably inject telemetry data using error-inducing requests that influence agent behavior through a form of adversarial input we call adversarial reward-hacking; plausible but incorrect system error interpretations that steer the agent’s decision-making.”

The post Guess what else GPT-5 is bad at? Security appeared first on CyberScoop.

DARPA’s AI Cyber Challenge reveals winning models for automated vulnerability discovery and patching

8 August 2025 at 17:53

The Pentagon’s two-year public competition to spur the development of cyber-reasoning systems that use large language models to autonomously find and patch vulnerabilities in open-source software concluded Friday with $8.5 million awarded to three teams of security specialists at DEF CON. 

The Defense Advanced Research Project Agency’s AI Cyber Challenge seeks to address a persistent bottleneck in cybersecurity — patching vulnerabilities before they are discovered or exploited by would-be attackers. 

“We’re living in a world right now that has ancient digital scaffolding that’s holding everything up,” DARPA Director Stephen Winchell said. “A lot of the code bases, a lot of the languages, a lot of the ways we do business, and everything we’ve built on top of it has all incurred huge technical debt… It is a problem that is beyond human scale.” 

The seven semifinalists that earned their spot out of 90 teams convened at last year’s DEF CON were scored against their models’ ability to quickly, accurately and successfully identify and generate patches for synthetic vulnerabilities across 54 million lines of code. The models discovered 77% of the vulnerabilities presented in the final scoring round and patched 61% of those synthetic defects at an average speed of 45 minutes, the competition organizers said.

The models also discovered 18 real zero-day vulnerabilities, including six in the C programming language and 12 in Java codebases. The teams’ models patched none of the C codebase zero-days, but automatically patched 11 of the Java zero-days, according to the final results shared Friday.

Team Atlanta took the first-place prize of $4 million, Trail of Bits won second place and $3 million in prize money, and Theori ranked third, taking home $1.5 million. The competition’s organizers allocated an additional $1.4 million in prize money for participants who can demonstrate when their technology is deployed into critical infrastructure. 

Representatives from the three winning teams said they plan to reinvest the majority of the prize money back into research and further development of their cyber-reasoning systems or explore ways to commercialize the technology.

Four of the models developed under the competition were made available as open source Friday, and the three remaining models will be released in the coming weeks, officials said.

“Our hope is this technology will harden source code by being integrated during the development stage, the most critical point in the software lifecycle,” Andrew Carney, program manager of the competition, said during a media briefing about the challenge last week. 

Open sourcing the cyber-reasoning systems and the AI Cyber Challenge’s infrastructure should also allow others to experiment and improve upon what the competition helped foster, he said. DARPA and partners across government and the private sector involved in the program are pursuing paths to push the technology developed during the competition into open-source software communities and commercial vendors for broader adoption.

DARPA’s AI Cyber Challenge is a public-private endeavor, with Google, Microsoft, Anthropic and OpenAI each donating $350,000 in LLM credits and additional support. The initiative seeks to test AI’s ability to identify and patch vulnerabilities in open-source code of vital importance throughout critical infrastructure, including health care. 

Jim O’Neill, deputy secretary of the Department of Health and Human Services, spoke to the importance of this work during the AI Cyber Challenge presentation at DEF CON. “Health systems are among the hardest networks to secure. Unlike other industries, hospitals must maintain 24/7 uptime, and they don’t get to reboot. They rely on highly specialized, legacy devices and complex IT ecosystems,” he said. 

“As a result, patching a vulnerability in health care can take an average of 491 days, compared to 60 to 90 days in most other industries,” O’Neill added. “Many cybersecurity products, unfortunately, are security theater. We need assertive proof-of-work approaches to keep networks, hospitals and patients safer.”

Health officials and others directly involved in the AI Cyber Challenge acknowledged the problems posed by insecure software are vast, but said the results showcased from this effort provide a glimmer of hope. 

“The magnitude of the problem is so incredibly overwhelming and unreasonable that this is starting to make it so that maybe we can actually secure networks — maybe,” Jennifer Roberts, director of resilient systems at HHS’s Advanced Research Projects Agency for Health, said during a media briefing at DEF CON after the winners were announced. 

Kathleen Fisher, director of DARPA’s Information Innovation Office, shared a similar cautiously optimistic outlook. “Software runs the world, and the software that is running the world is riddled with vulnerabilities,” she said.

“We have this sense of learned helplessness, that there’s just nothing we can do about it. That’s the way software is,” she continued. The AI Cyber Challenge “points to a brighter future where software does what it’s supposed to do and nothing else.”

The post DARPA’s AI Cyber Challenge reveals winning models for automated vulnerability discovery and patching appeared first on CyberScoop.

❌
❌