Reading view

There are new articles available, click to refresh the page.

Top AI companies have spent months working with US, UK governments on model safety

Both OpenAI and Anthropic said earlier this month they are working with the U.S. and U.K. governments to bolster the safety and security of their commercial large language models in order to make them harder to abuse or misuse.

In a pair of blogs posted to their websites Friday, the companies said for the past year or so they have been working with researchers at the National Institute of Standards and Technology’s U.S. Center for AI Standards for Innovation and the U.K. AI Security Institute.

That collaboration included granting government researchers access to the  companies’ models, classifiers, and training data. Its purpose has been to enable independent experts to assess how resilient the models are to outside attacks from malicious hackers, as well as their effectiveness in blocking legitimate users from leveraging the technology for legally or ethically questionable purposes.

OpenAI’s blog details the work with the institutes, which studied  the capabilities of ChatGPT in cyber, chemical-biological and “other national security relevant domains.”That partnership has since been expanded to newer products, including red-teaming the company’s AI agents and exploring new ways for OpenAI “to partner with external evaluators to find and fix security vulnerabilities.”

OpenAI already works with selected red-teamers who scour their products for vulnerabilities, so the announcement suggests the company may be exploring a separate red-teaming process for its AI agents.

According to OpenAI, the engagement with NIST yielded insights around two novel vulnerabilities affecting their systems. Those vulnerabilities “could have allowed a sophisticated attacker to bypass our security protections, and to remotely control the computer systems the agent could access for that session and successfully impersonate the user for other websites they’d logged into,” the company said.

Initially, engineers at OpenAI believed the vulnerabilities were unexploitable and “useless” due to existing security safeguards. But researchers identified a way to combine the vulnerabilities with a known AI hijacking technique — which corrupts the underlying context data the agent relies on to guide its behavior — that allowed them to take over another user’s agent with a 50% success rate.  

Between May and August, OpenAI worked  with researchers at the U.K. AI Security Institute to test and improve safeguards in GPT5 and ChatGPT Agent. The engagement focused on red-teaming the models to prevent biological misuse —  preventing the model from providing step-by-step instructions for making bombs, chemical or biological weapons.

The company said it provided the British government with non-public prototypes of its safeguard systems, test models stripped of any guardrails, internal policy guidance on its safety work, access to internal safety monitoring models and other bespoke tooling.

Anthropic also said it gave U.S. and U.K. government researchers access to its Claude AI systems for ongoing testing and research at different stages of development, as well as its classifier system for finding jailbreak vulnerabilities.

That work identified several prompt injection attacks that bypassed safety protections within Claude — again by poisoning the context the model relies on with hidden, malicious prompts — as well as a new universal jailbreak method capable of evading standard detection tools. The jailbreak vulnerability was so severe that Anthropic opted to restructure its entire safeguard architecture rather than attempt to patch it.

Anthropic said the collaboration taught the company that giving government red-teamers deeper access to their systems could lead to more sophisticated vulnerability discovery.

“Governments bring unique capabilities to this work, particularly deep expertise in national security areas like cybersecurity, intelligence analysis, and threat modeling that enables them to evaluate specific attack vectors and defense mechanisms when paired with their machine learning expertise,” Anthropic’s blog stated.

OpenAI and Anthropic’s work with the U.S. and U.K. comes as some AI safety and security experts have questioned whether those governments and AI companies may be deprioritizing technical safety guardrails as policymakers seek to give their domestic industries maximal freedom to compete with China and other competitors for global market dominance.

After coming into office, U.S. Vice President JD Vance downplayed the importance of AI safety at international summits, while British Labour Party Prime Minister Keir Starmer reportedly walked back a promise in the party’s election manifesto to enforce safety regulations on AI companies following Donald Trump’s election. A more symbolic example: both the U.S. and U.K. government AI institutes changed their names this earlier year to remove the word “safety.”

But the collaborations indicate that some of that work remains ongoing, and not every security researcher agrees that the models are necessarily getting worse.

Md Raz, a Ph.D student at New York University who is part of a team of researchers that study cybersecurity and AI systems, told CyberScoop that in his experience commercial models are getting harder, not easier, to jailbreak with each new release.

“Definitely over the past few years I think between GPT4 and GPT 5 … I saw a lot more guardrails in GPT5, where GPT5 will put the pieces together before it replies and sometimes it will say, ‘no, I’m not going to do that.’”

Other AI tools, like coding models “are a lot less thoughtful about the bigger picture” of what they’re being asked to do and whether it’s malicious or not, he added, while open-source models are “most likely to do what you say” and existing guardrails can be more easily circumvented.

The post Top AI companies have spent months working with US, UK governments on model safety appeared first on CyberScoop.

Bypassing CSP with JSONP: Introducing JSONPeek and CSP B Gone

A Content Security Policy (CSP) is a security mechanism implemented by web servers and enforced by browsers to prevent various types of attacks, primarily cross-site scripting (XSS). CSP works by restricting resources (scripts, stylesheets, images, etc.) on a webpage to only execute if they come from approved sources. However, like most things in security, CSP isn't bulletproof.

The post Bypassing CSP with JSONP: Introducing JSONPeek and CSP B Gone appeared first on Black Hills Information Security, Inc..

Guess what else GPT-5 is bad at? Security

On Aug. 7, OpenAI released GPT-5, its newest frontier large language model, to the public. Shortly after, all hell broke loose.

Billed as faster, smarter and more capable tools for enterprise organizations than previous models, GPT-5 has instead met an angry user base that has found its performance and reasoning skills wanting.

And in the five days since its release, security researchers have also noticed something about GPT-5: it completely fails on core security and safety metrics.

Since going public, OpenAI’s newest tool for businesses and organizations has been subjected to extensive tinkering by outside security researchers, many of whom identified vulnerabilities and weaknesses in GPT-5 that were already discovered and patched in older models.

AI red-teaming company SPLX subjected it to over 1,000 different attack scenarios, including prompt injection, data and context poisoning, jailbreaking and data exfiltration, finding the default version of GPT-5 “nearly unusable for enterprises” out of the box.

It scored just a 2.4% on an assessment for security, 13.6% for safety and 1.7% for “business alignment,” which SPLX describes as the model’s propensity for refusing tasks that are outside of its domain, leaking data or unwittingly promoting competing products.

Default versions of GPT-5 perform poorly on security, safety and business alignment, though they improve significantly with prompting. (Source: SPLX)

Ante Gojsalic, chief technology officer and co-founder of SPLX, told CyberScoop that his team was initially surprised at the level of poor security and lack of safety guardrails inherent in OpenAI’s newest model. Microsoft claimed that internal red-team testing on GPT-5 was done with “rigorous security protocols” and concluded it “exhibited one of the strongest AI safety profiles among prior OpenAI models against several modes of attack, including malware generation, fraud/scam automation and other harms.”

“Our expectation was GPT-5 will be better like they presented on all the benchmarks,” Gojsalic said. “And this was the key surprising moment, when we [did] our scan, we saw … it’s terrible. It’s far behind for all models, like on par with some open-source models and worse.”

In an Aug. 7 blog post published by Microsoft, Sarah Bird, chief product officer of responsible AI at the company, is quoted saying that the “Microsoft AI/Red Team found GPT-5 to have one of the strongest safety profiles of any OpenAI model.”

OpenAI’s system card for GPT-5 provides further details on how GPT-5 was tested for safety and security, saying the model underwent weeks of testing from the company’s internal red team and external third parties. These assessments focused on the pre-deployment phase, safeguards around the actual use of the model and vulnerabilities in connected APIs.

“Across all our red teaming campaigns, this work comprised more than 9,000 hours of work from over 400 external testers and experts. Our red team campaigns prioritized topics including violent attack planning, jailbreaks which reliably evade our safeguards, prompt injections, and bioweaponization,” the system card states.

Gojsalic explained the disparity in Microsoft and OpenAI’s claims and his company’s findings by pointing to other priorities those companies have when pushing out new frontier models.

All new commercial models are racing toward competency in a prescribed set of metrics that measure the kind of capabilities — such as code generation, mathematical formulas and life sciences like biology, physics and chemistry — that customers most covet. Scoring at the top of the leaderboard for these metrics is “basically a pre-requirement” for any newly released commercial model, he said.

High marks for security and safety do not rank similarly in importance, and Gojsalic said developers at OpenAI and Microsoft “probably did a very specific set of tests which are not industry relevant” to claim security and safety features were up to snuff.

In response to questions about the SPLX research, an OpenAI spokesperson said GPT-5 was tested using StrongReject, an academic benchmark developed last year by researchers at University of California, Berkeley used to test models against jailbreaking.

The spokesperson added: “We take steps to reduce the risk of malicious use, and we’re continually improving safeguards to make our models more robust against exploits like jailbreaks.”

Other cybersecurity researchers have claimed to have found significant vulnerabilities in GPT-5 less than a week after its release.

NeuralTrust, an AI-focused cybersecurity firm, said it identified a way to jailbreak the base model through context poisoning — an attack technique that manipulates the contextual information and instructions GPT-5 uses to learn more about specific projects or tasks they’re working on.

Using Echo Chamber, a jailbreaking technique first identified in June, the attacker can make a series of requests that lead the model into increasingly abstract mindsets, allowing it to slowly break free of its constraints.

“We showed that Echo Chamber, when combined with narrative-driven steering, can elicit harmful outputs from [GPT-5] without issuing explicitly malicious prompts,” wrote Martí Jordà, a cybersecurity software engineer at NeuralTrust. “This reinforces a key risk: keyword or intent-based filters are insufficient in multi-turn settings where context can be gradually poisoned and then echoed back under the guise of continuity.”

A day after GPT-5 was released, researchers at RSAC Labs and George Mason University released a study on agentic AI use in organizations, concluding that “AI-driven automation comes with a profound security cost.” Chiefly, attackers can use similar manipulation techniques to compromise the behavior of a wide range of models. While GPT-5 was not tested as part of their research, GPT-4o and 4.1 were. 

“We demonstrate that adversaries can manipulate system telemetry to mislead AIOps agents into taking actions that compromise the integrity of the infrastructure they manage,” the authors wrote. “We introduce techniques to reliably inject telemetry data using error-inducing requests that influence agent behavior through a form of adversarial input we call adversarial reward-hacking; plausible but incorrect system error interpretations that steer the agent’s decision-making.”

The post Guess what else GPT-5 is bad at? Security appeared first on CyberScoop.

Getting Started with NetExec: Streamlining Network Discovery and Access

One tool that I can't live without when performing a penetration test in an Active Directory environment is called NetExec. Being able to efficiently authenticate against multiple systems in the network is crucial, and NetExec is an incredibly powerful tool that helps automate a lot of this activity.

The post Getting Started with NetExec: Streamlining Network Discovery and Access appeared first on Black Hills Information Security, Inc..

How to Design and Execute Effective Social Engineering Attacks by Phone

How to Design and Execute Effective Social Engineering Attacks by Phone

Social engineering is the manipulation of individuals into divulging confidential information, granting unauthorized access, or performing actions that benefit the attacker, all without the victim realizing they are being tricked.

The post How to Design and Execute Effective Social Engineering Attacks by Phone appeared first on Black Hills Information Security, Inc..

Abusing S4U2Self for Active Directory Pivoting

TL;DR If you only have access to a valid machine hash, you can leverage the Kerberos S4U2Self proxy for local privilege escalation, which allows reopening and expanding potential local-to-domain pivoting paths, such as SEImpersonate!

The post Abusing S4U2Self for Active Directory Pivoting appeared first on Black Hills Information Security, Inc..

Augmenting Penetration Testing Methodology with Artificial Intelligence – Part 1: Burpference

Burpference is a Burp Suite plugin that takes requests and responses to and from in-scope web applications and sends them off to an LLM for inference. In the context of artificial intelligence, inference is taking a trained model, providing it with new information, and asking it to analyze this new information based on its training.

The post Augmenting Penetration Testing Methodology with Artificial Intelligence – Part 1: Burpference appeared first on Black Hills Information Security, Inc..

Offline Memory Forensics With Volatility

Volatility is a memory forensics tool that can pull SAM hashes from a vmem file. These hashes can be used to escalate from a local user or no user to a domain user leading to further compromise.

The post Offline Memory Forensics With Volatility appeared first on Black Hills Information Security, Inc..

Why Your Org Needs a Penetration Test Program

This webcast originally aired on February 27, 2025. Join us for a very special free one-hour Black Hills Information Security webcast with Corey Ham & Kelli Tarala on why your […]

The post Why Your Org Needs a Penetration Test Program appeared first on Black Hills Information Security, Inc..

Gone Phishing: Installing GoPhish and Creating a Campaign

GoPhish provides a nice platform for creating and running phishing campaigns. This blog will guide you through installing GoPhish and creating a campaign. 

The post Gone Phishing: Installing GoPhish and Creating a Campaign appeared first on Black Hills Information Security, Inc..

5 Things We Are Going to Continue to Ignore in 2025

In this video, John Strand discusses the complexities and challenges of penetration testing, emphasizing that it goes beyond just finding and exploiting vulnerabilities.

The post 5 Things We Are Going to Continue to Ignore in 2025 appeared first on Black Hills Information Security, Inc..

Attack Tactics 9: Shadow Creds for PrivEsc w/ Kent & Jordan

In this video, Kent Ickler and Jordan Drysdale discuss Attack Tactics 9: Shadow Credentials for Primaries, focusing on a specific technique used in penetration testing services at Black Hills Information Security

The post Attack Tactics 9: Shadow Creds for PrivEsc w/ Kent & Jordan appeared first on Black Hills Information Security, Inc..

DLL Hijacking – A New Spin on Proxying your Shellcode

This webcast was originally published on October 4, 2024. In this video, experts delve into the intricacies of DLL hijacking and new techniques for malicious code proxying, featuring a comprehensive […]

The post DLL Hijacking – A New Spin on Proxying your Shellcode appeared first on Black Hills Information Security, Inc..

Blue Team, Red Team, and Purple Team: An Overview

By Erik Goldoff, Ray Van Hoose, and Max Boehner || Guest Authors This post is comprised of 3 articles that were originally published in the second edition of the InfoSec […]

The post Blue Team, Red Team, and Purple Team: An Overview appeared first on Black Hills Information Security, Inc..

Proxying Your Way to Code Execution – A Different Take on DLL Hijacking 

While DLL hijacking attacks can take on many different forms, this blog post will explore a specific type of attack called DLL proxying, providing insights into how it works, the potential risks it poses, and briefly the methodology for discovering these vulnerable DLLs, which led to the discovery of several zero-day vulnerable DLLs that Microsoft has acknowledged but opted to not fix at this time.

The post Proxying Your Way to Code Execution – A Different Take on DLL Hijacking  appeared first on Black Hills Information Security, Inc..

How to Perform and Combat Social Engineering

This article was originally published in the second edition of the InfoSec Survival Guide. Find it free online HERE or order your $1 physical copy on the Spearphish General Store. […]

The post How to Perform and Combat Social Engineering appeared first on Black Hills Information Security, Inc..

WifiForge – WiFi Exploitation for the Classroom

by William Oldert // BHIS Intern BHIS had a problem.   We needed an environment for students to learn WiFi hacking safely. Our original solution used interconnected physical network gear […]

The post WifiForge – WiFi Exploitation for the Classroom appeared first on Black Hills Information Security, Inc..

Introducing SlackEnum: A User Enumeration Tool for Slack

Recently, as part of our ANTISOC Continuous Penetration Testing (CPT) service, I had an opportunity to investigate how attackers can leverage Slack in cyber-attacks, similar to how we frequently use […]

The post Introducing SlackEnum: A User Enumeration Tool for Slack appeared first on Black Hills Information Security, Inc..

❌