Black Hills Information Security
Getting Started with AI Hacking Part 2: Prompt Injection 8 October 2025 at 12:11

Getting Started with AI Hacking Part 2: Prompt Injection

By: BHIS

8 October 2025 at 12:11

In Part 2, we’re diving headfirst into one of the most critical attack surfaces in the LLM ecosystem - Prompt Injection: The AI version of talking your way past the bouncer.

The post Getting Started with AI Hacking Part 2: Prompt Injection appeared first on Black Hills Information Security, Inc..

CyberScoop
Researchers flag code that uses AI systems to carry out ransomware attacks 26 August 2025 at 16:20

Researchers flag code that uses AI systems to carry out ransomware attacks

CyberScoop

By: djohnson

26 August 2025 at 16:20

Researchers at cybersecurity firm ESET claim to have identified the first piece of AI-powered ransomware in the wild.

The malware, called PromptLock, essentially functions as a hard-coded prompt injection attack on a large language model, causing the model to assist in carrying out a ransomware attack.

Written in Golang programming code, the malware sends its requests through Ollama, an open-source API for interfacing with large language models, and a local version of an open-weights model (gpt-oss:20b) from OpenAI to execute tasks.

Those tasks include inspecting local filesystems, exfiltrating files and encrypting data for Windows, Mac and Linux devices using SPECK 128-bit encryption.

According to senior malware researcher Anton Cherepanov, the code was discovered Aug. 25 by ESET on VirusTotal, an online repository for malware analysis. Beyond knowing that it was uploaded somewhere in the U.S., he had no further details on its origins.

“Notably, attackers don’t need to deploy the entire gpt-oss-20b model within the compromised network,” he said. ”Instead, they can simply establish a tunnel or proxy from the affected network to a server running Ollama with the model.”

ESET believes the code is likely a proof of concept, noting that functionality for a feature that destroys data appears unfinished. Notably, Cherepanov told CyberScoop that they have yet to see evidence of the malware being deployed by threat actors in ESET telemetry.

“Although multiple indicators suggest the sample is a proof-of-concept (PoC) or work-in-progress rather than fully operational malware deployed in the wild, we believe it is our responsibility to inform the cybersecurity community about such developments,” the company said on X.

In screenshots provided by ESET, the ransomware code embeds instructions to the LLM, telling it to generate malicious Lua scripts, asking it to verify the contents of files to determine if they contain personally identifiable information and – using its “analysis mode” – generating a ransom note based on what the program thought a ransomware actor might write.

It also provided a sample Bitcoin address – which appears to be the known address of the cryptocurrency’s anonymous creator Satoshi Nakamoto – to use when demanding payment.

It’s a novel example of leveraging security holes in the prompting process, inducing an AI program to carry out the core functions of ransomware: locking files, stealing data, threatening and extorting victims and extracting payment.

Researchers in AI security are increasingly highlighting the potential risk for businesses and organizations who deploy AI “agents” into their networks, noting that these programs must be given high level administrative access to carry out their jobs, are vulnerable to prompt injection attacks and can be turned against their owners.

Because the malware relies on scripts generated by AI, Cherepanov said one difference between PromptLock and other ransomware “is that indicators of compromise (IoCs) may vary from one execution to another.”

“Theoretically, if properly implemented, this could significantly complicate detection and make defenders’ jobs more difficult,” he noted.

The post Researchers flag code that uses AI systems to carry out ransomware attacks appeared first on CyberScoop.

CyberScoop
Guess what else GPT-5 is bad at? Security 12 August 2025 at 13:38

Guess what else GPT-5 is bad at? Security

CyberScoop

By: djohnson

12 August 2025 at 13:38

On Aug. 7, OpenAI released GPT-5, its newest frontier large language model, to the public. Shortly after, all hell broke loose.

Billed as faster, smarter and more capable tools for enterprise organizations than previous models, GPT-5 has instead met an angry user base that has found its performance and reasoning skills wanting.

And in the five days since its release, security researchers have also noticed something about GPT-5: it completely fails on core security and safety metrics.

Since going public, OpenAI’s newest tool for businesses and organizations has been subjected to extensive tinkering by outside security researchers, many of whom identified vulnerabilities and weaknesses in GPT-5 that were already discovered and patched in older models.

AI red-teaming company SPLX subjected it to over 1,000 different attack scenarios, including prompt injection, data and context poisoning, jailbreaking and data exfiltration, finding the default version of GPT-5 “nearly unusable for enterprises” out of the box.

It scored just a 2.4% on an assessment for security, 13.6% for safety and 1.7% for “business alignment,” which SPLX describes as the model’s propensity for refusing tasks that are outside of its domain, leaking data or unwittingly promoting competing products.

Default versions of GPT-5 perform poorly on security, safety and business alignment, though they improve significantly with prompting. (Source: SPLX)

Ante Gojsalic, chief technology officer and co-founder of SPLX, told CyberScoop that his team was initially surprised at the level of poor security and lack of safety guardrails inherent in OpenAI’s newest model. Microsoft claimed that internal red-team testing on GPT-5 was done with “rigorous security protocols” and concluded it “exhibited one of the strongest AI safety profiles among prior OpenAI models against several modes of attack, including malware generation, fraud/scam automation and other harms.”

“Our expectation was GPT-5 will be better like they presented on all the benchmarks,” Gojsalic said. “And this was the key surprising moment, when we [did] our scan, we saw … it’s terrible. It’s far behind for all models, like on par with some open-source models and worse.”

In an Aug. 7 blog post published by Microsoft, Sarah Bird, chief product officer of responsible AI at the company, is quoted saying that the “Microsoft AI/Red Team found GPT-5 to have one of the strongest safety profiles of any OpenAI model.”

OpenAI’s system card for GPT-5 provides further details on how GPT-5 was tested for safety and security, saying the model underwent weeks of testing from the company’s internal red team and external third parties. These assessments focused on the pre-deployment phase, safeguards around the actual use of the model and vulnerabilities in connected APIs.

“Across all our red teaming campaigns, this work comprised more than 9,000 hours of work from over 400 external testers and experts. Our red team campaigns prioritized topics including violent attack planning, jailbreaks which reliably evade our safeguards, prompt injections, and bioweaponization,” the system card states.

Gojsalic explained the disparity in Microsoft and OpenAI’s claims and his company’s findings by pointing to other priorities those companies have when pushing out new frontier models.

All new commercial models are racing toward competency in a prescribed set of metrics that measure the kind of capabilities — such as code generation, mathematical formulas and life sciences like biology, physics and chemistry — that customers most covet. Scoring at the top of the leaderboard for these metrics is “basically a pre-requirement” for any newly released commercial model, he said.

High marks for security and safety do not rank similarly in importance, and Gojsalic said developers at OpenAI and Microsoft “probably did a very specific set of tests which are not industry relevant” to claim security and safety features were up to snuff.

In response to questions about the SPLX research, an OpenAI spokesperson said GPT-5 was tested using StrongReject, an academic benchmark developed last year by researchers at University of California, Berkeley used to test models against jailbreaking.

The spokesperson added: “We take steps to reduce the risk of malicious use, and we’re continually improving safeguards to make our models more robust against exploits like jailbreaks.”

Other cybersecurity researchers have claimed to have found significant vulnerabilities in GPT-5 less than a week after its release.

NeuralTrust, an AI-focused cybersecurity firm, said it identified a way to jailbreak the base model through context poisoning — an attack technique that manipulates the contextual information and instructions GPT-5 uses to learn more about specific projects or tasks they’re working on.

Using Echo Chamber, a jailbreaking technique first identified in June, the attacker can make a series of requests that lead the model into increasingly abstract mindsets, allowing it to slowly break free of its constraints.

“We showed that Echo Chamber, when combined with narrative-driven steering, can elicit harmful outputs from [GPT-5] without issuing explicitly malicious prompts,” wrote Martí Jordà, a cybersecurity software engineer at NeuralTrust. “This reinforces a key risk: keyword or intent-based filters are insufficient in multi-turn settings where context can be gradually poisoned and then echoed back under the guise of continuity.”

A day after GPT-5 was released, researchers at RSAC Labs and George Mason University released a study on agentic AI use in organizations, concluding that “AI-driven automation comes with a profound security cost.” Chiefly, attackers can use similar manipulation techniques to compromise the behavior of a wide range of models. While GPT-5 was not tested as part of their research, GPT-4o and 4.1 were.

“We demonstrate that adversaries can manipulate system telemetry to mislead AIOps agents into taking actions that compromise the integrity of the infrastructure they manage,” the authors wrote. “We introduce techniques to reliably inject telemetry data using error-inducing requests that influence agent behavior through a form of adversarial input we call adversarial reward-hacking; plausible but incorrect system error interpretations that steer the agent’s decision-making.”

The post Guess what else GPT-5 is bad at? Security appeared first on CyberScoop.