Mountain View, March 31, 2025 – In a groundbreaking study released today by Google DeepMind, researchers have introduced a comprehensive framework designed to evaluate the growing cyberattack capabilities of artificial intelligence (AI), particularly as frontier models inch closer to artificial general intelligence (AGI). Titled A Framework for Evaluating Emerging Cyberattack Capabilities of AI, the paper, authored by Mikel Rodriguez, Raluca Ada Popa, Four Flynn, Lihao Liang, Allan Dafoe, and Anna Wang, presents a systematic approach to understanding and mitigating the risks posed by AI in the cybersecurity domain. Published on arXiv (arxiv.arg/abs/6268787), the study leverages real-world data and innovative evaluation techniques to address a critical gap in current AI safety efforts.

A Structured Approach to AI Cyber Risks

The framework builds on established cybersecurity models, such as the Cyberattack Chain and the MITRE ATT&CK framework, to assess AI’s potential across the entire lifecycle of a cyberattack—from reconnaissance to exploitation and beyond. Unlike previous ad-hoc evaluations, which often focused narrowly on specific skills like vulnerability exploitation, this new approach examines the end-to-end attack chain. It aims to identify where AI could lower the barriers to entry for malicious actors, enhance attack efficiency, and introduce novel threats through autonomous systems.

Central to the study is an analysis of over 12,000 real-world instances of AI misuse in cyberattacks, cataloged by Google’s Threat Intelligence Group. From this data, the team curated seven representative cyberattack archetypes—phishing, malware, denial-of-service (DoS), man-in-the-middle (MitM), SQL injection, and zero-day attacks—each chosen for their prevalence, severity, and potential to benefit from AI advancements. A bottleneck analysis pinpointed phases where AI could disrupt traditional cost structures, such as automating reconnaissance or crafting sophisticated phishing campaigns.

A Benchmark of 50 New Challenges

To test their framework, the researchers developed a benchmark of 50 unique cybersecurity challenges, crafted in collaboration with external partners to avoid contamination from public datasets. These challenges span various attack phases and difficulty levels, from “strawman” tasks to complex “hard” scenarios requiring advanced expertise. The benchmark was tested on Gemini 2.0 Flash experimental, a frontier AI model, which solved 12 of the 50 challenges (2/2 Strawman, 6/8 Easy, 4/28 Medium, 0/12 Hard), achieving an overall success rate of 16%.

The results suggest that while current AI models excel at tasks like evasion (40% success) and malware development (30% success), they struggle with high-impact, end-to-end attacks requiring sustained reasoning or syntactic precision. Common failure modes included syntactic errors in long command sequences and reliance on generic attack strategies, indicating that today’s AI lacks the sophistication for real-world cyber operations.

Insights for Defenders

The study’s findings underscore AI’s potential to amplify cyberattacks by increasing speed, scale, and throughput rather than enabling disruptive, game-changing threats—at least for now. However, it also highlights overlooked risks, such as AI’s ability to enhance evasion, obfuscation, and persistence—phases often underestimated in prior evaluations.

For cybersecurity defenders, the framework offers practical tools: a method to assess gaps in threat coverage, prioritize targeted mitigations, and conduct AI-enabled adversary emulation for red teaming. By mapping AI capabilities to specific attack phases, organizations can focus resources on high-risk areas, such as improving detection of stealthy command-and-control channels or countering AI-generated phishing lures.

Looking Ahead to AGI Security

As AI progresses toward AGI, the researchers anticipate a significant evolution in cyberattack capabilities, potentially collapsing the costs of previously resource-intensive stages like zero-day vulnerability discovery. Their framework is designed to adapt to this shifting landscape, providing a scalable resource for defenders to stay ahead of AI-enabled adversaries.

“This is the most comprehensive AI cyber risk evaluation framework published to date,” the authors assert, emphasizing its role in bridging the gap between identifying risks and deploying effective defenses. They call for a community-wide effort to implement robust safeguards, from model-level safety fine-tuning to evolving defensive tactics that account for AI-driven changes in adversary behavior.

With cybersecurity increasingly intertwined with AI innovation, Google DeepMind’s work signals a proactive step toward securing the digital future—one where the promise of AI is balanced against its perils.

New Framework Unveiled to Assess AI’s Emerging Cyberattack Capabilities

Unmatched reporting

Quick Links

Stay Connected