Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

AI might cut false positives, but it won’t stop the slop 

By: djohnson
18 May 2026 at 16:45

As defenders get their hands on newer AI models with more powerful cybersecurity capabilities like Anthropic’s Mythos and OpenAI’s Daybreak, organizations are being told to prepare for a flood of new vulnerability reports.

But for bug bounty programs across the nation, that day may already be here, as yesterday’s frontier models and today’s open-source AI tools have dramatically increased the volume of bug reports flowing into companies around their own products or on larger bounty platforms online.

GitHub, one of the world’s largest online code repositories, said it is tightening its definition of a “complete” bug report after a significant increase in AI-assisted submissions over the past year.

Although the influx has had some benefits, many reports are submitted without proof of concept, are reliant on unrealistic attack scenarios or cover issues already listed as ineligible. As a result, the company is having difficulty separating signal from noise.

“This isn’t unique to GitHub,” wrote Jarom Brown, senior product security engineer at GitHub. “Programs across the industry are grappling with the same challenge, and some have shut down entirely.”

Brown said GitHub does not want to ban the use of AI generated reports entirely, calling it a “force multiplier” for security in the right context. But in a world where it’s never been easier to use AI to generate theoretical bugs, the company wants researchers to go the extra mile to confirm that their discoveries can actually be exploited in real-world conditions.

What we need is the same standard we’ve always expected: validation,” Brown wrote. “An AI-assisted finding that’s been verified, reproduced, and submitted with a working proof of concept is a great submission. An unvalidated output submitted as-is without reproduction or demonstrated impact is not.”

Grant Bourzikas, chief security officer at Cloudflare, said triaging bugs and proving they can be exploited  has always been one of the hardest parts of vulnerability research, and AI vulnerability scanners and code have “made it worse.”

For instance, code written in C and C++ programming languages are vulnerable to a range of exploits – like buffer overflows and out-of-bounds reading and writing – that don’t exist in memory safe languages like Rust. AI tools scanning software written in memory unsafe programming languages are far more likely to generate false positives.

But one of the biggest flaws continues to be that AI tools are also designed to give the user what they’re asking for, even when it’s not there. This leads to the generation of bug reports filled with speculation and qualifiers around exploitability that require human follow up.

“That’s a reasonable bias for an exploratory tool,” Bourzikas wrote. “It’s a ruinous one for a triage queue, where every speculative finding spends human attention and tokens to dismiss, and that cost compounds across thousands of findings.”

Cloudflare recently shared results from testing Mythos on 50 of its own code repositories, looking for exploits. Bourzikas called Mythos “a different kind of tool doing a different kind of work” from other frontier models, and that it made significant progress in reducing false positives.

For example, he pointed to two Mythos capabilities that stood out compared to other models: chaining exploits together and generating its own proof-of-concept code to confirm exploitability.

Older models could spot many of the same bugs, but they often couldn’t figure out how to exploit them effectively, or show that the issue could be exploited in real world conditions.

Others have argued that the gap in bug hunting capabilities between newer frontier AI models and older ones, or open source models available today is not as large as advertised. 

Swedish software developer Daniel Stenberg, lead developer for curl, an open source file transfer tool used around the world, recently wrote about his experience with Mythos Preview. Like others, he has also seen a higher volume of AI-fueled bug reports over the past year, but said the flood of low-quality reports has tapered off significantly since March as models have improved.

Curl is mature and polished by the standards of most software: Stenberg estimates each line of code has been rewritten or altered at least four times, and he said he has used both human and AI tools in the past to implement hundreds of bug fixes over Curl’s existence.

That makes it a unique testing ground for the enhanced capabilities of Mythos, which was reportedly so powerful at finding vulnerabilities that Anthropic opted not to release it to the general public.

After gaining access to Mythos, Stenberg received the results of a scan of 178,000 lines of curl code. Ultimately, the scan flagged five “confirmed” vulnerabilities. Further exploration by human researchers found that 4 of the bugs were false positives or had no security impact. The one remaining bug Mythos found? A low-severity flaw that will be fixed in a regular June update.

Even as he praised the impact of AI on cybersecurity generally, Stenberg concluded that for all the hype, Mythos is only “a bit better” than previously released models.

“My personal conclusion can however not end up with anything else than that the big hype around this model so far was primarily marketing,” he wrote. “I see no evidence that this setup finds issues to any particular higher or more advanced degree than the other tools have done before Mythos.”

The post AI might cut false positives, but it won’t stop the slop  appeared first on CyberScoop.

If consequences matter, they should apply to vendors, too

By: Greg Otto
11 March 2026 at 06:00

Washington has rediscovered consequences. Just not consistently.

The March 6 executive order rests on a simple, correct idea: cyber-enabled fraud persists because it is profitable, scalable, and too often tolerated. So the government’s answer is to raise the cost. More coordination. More disruption. More prosecutions. More diplomatic pressure on the states that shelter these operations.

Good.

But weeks ago, an OMB Memo rescinded earlier federal software supply chain memos issued during the Biden administration. In practice, that pulled back from the prior attestation-centered model and made tools like the Secure Software Development Attestation Form and SBOM requests optional rather than durable expectations.

Put plainly, we are getting tougher on the people exploiting digital systems while getting softer on the conditions that make those systems so easy to exploit.

The executive order gets something important right. Cyber-enabled fraud is not a collection of random online annoyances. It is an industrialized form of predation: ransomware, phishing, impersonation, sextortion, and financial fraud that’s run as repeatable business models, often transnational and sometimes protected by permissive states. The order responds with a more centralized federal posture built around disruption, coordination, intelligence sharing, prosecution, resilience, and international pressure.

That is directionally correct. Criminal ecosystems do not retreat because we publish better guidance. They retreat when the cost of doing business rises.

But then we arrive at software.

The critique of the old federal assurance regime is not entirely wrong. Compliance can become theater. Bureaucracies are very good at turning legitimate security goals into rituals of form collection and checkbox management. Some skepticism was warranted. OMB says as much explicitly, arguing the prior model became burdensome and prioritized compliance over genuine security investment.

Still, the failure of bad compliance is not proof that accountability itself was the problem.

That is where the logic breaks. The administration is clearly willing to believe that criminal actors respond to deterrence. It is willing to use prosecutions, sanctions, visa restrictions, and coordinated pressure downstream. But upstream, where insecure technology shapes the terrain those criminals exploit, the theory suddenly changes. There, we are told to trust discretion. Local judgment. Flexible, risk-based decisions.

Sometimes that is wisdom. Often it is just a more elegant way of saying no one wants a hard requirement.

This is also why my own position has not changed. In a post I wrote in 2024, I argued that the industry did not need softer expectations or another round of polite encouragement. It needed more concrete action and consequences strong enough to change incentives. The problem was never that we were demanding too much accountability. The problem was that insecure software remained too cheap to ship.

That is the deeper issue. Cybercrime at scale does not thrive only because criminals exist. It thrives because the environment rewards them. Weak identity systems, brittle software, sprawling dependency chains, poor visibility, and diffuse accountability all make predation cheaper. The people who ship avoidable risk rarely absorb the full cost of it. Everyone else does.

So these two policy moves, taken together, reveal something uncomfortable. The government seems to believe in consequences for cybercriminals, but not quite in consequences for insecure production. It wants deterrence for the scammer, but discretion for the supplier.

A coherent cyber strategy would do both. It would aggressively disrupt criminal networks and also create meaningful pressure for secure-by-design production and procurement. It would recognize that punishing attackers matters, but so does changing the terrain that keeps making attack profitable.

The administration is right about one thing: cybercrime will not shrink until the costs of predation rise.

The unanswered question is why that logic should stop at the edge of the scam center.

Brian Fox is the co-founder and CTO of Sonatype.

The post If consequences matter, they should apply to vendors, too appeared first on CyberScoop.

❌
❌