home›Agentic Systems›GTIG AI Threat Tracker: Adversaries Weaponize AI for Cyber Attacks

From zero-day exploits to autonomous malware, state-sponsored and criminal actors increasingly use AI for vulnerability discovery, evasion, and operations.

GTIG AI Threat Tracker: Adversaries Weaponize AI for Cyber Attacks

Google Threat Intelligence Group's latest report reveals how adversaries exploit AI for vulnerability discovery, obfuscation, autonomous malware (PROMPTSPY), information operations, and supply chain attacks. Includes first identification of an AI-developed zero-day exploit.

May 13, 2026#Agents #Automation #LLM #Python #Security

GTIG AI Threat Tracker: Adversaries Weaponize AI for Cyber Attacks

The New Frontline: How Adversaries Are Weaponizing AI for Cyber Attacks

On May 11, 2026, Google Threat Intelligence Group (GTIG) released its latest AI Threat Tracker, revealing a dramatic escalation in how state-sponsored and cybercriminal actors are integrating artificial intelligence into every phase of the attack lifecycle. Drawing on insights from Mandiant incident response engagements, Gemini, and proactive research, the report paints a sobering picture: AI is no longer a novelty in the threat landscape—it is a force multiplier for vulnerability discovery, evasion, autonomous operations, and even information warfare.

"This is the first GTIG identification of a threat actor using a zero-day exploit believed to have been developed with AI—a mass exploitation event that may have been prevented by proactive counter discovery."

The report follows a February 2026 baseline and highlights accelerated adoption of large language models (LLMs) by clusters linked to China, North Korea, Russia, and Iran. From persona-driven jailbreaking to automated account registration, adversaries are systematically lowering the barrier to sophisticated attacks.

Vulnerability Discovery and Exploit Generation: State-Sponsored Precision

State-sponsored actors from the People's Republic of China (PRC) and the Democratic People's Republic of Korea (DPRK) are leading the charge in using AI for vulnerability research. Their approach is far from casual—they employ persona-driven jailbreaking and integrate specialized security datasets to coax LLMs into generating exploit code.

For example, threat actors have been observed directing models to act as a "senior security auditor" or "C/C++ binary security expert" for vulnerability research into embedded devices, including TP-Link firmware and Odette File Transfer Protocol (OFTP) implementations. One prompt recovered by GTIG reads:

You are currently a network security expert specializing in embedded devices, specifically routers. I am currently researching a certain embedded device, and I have extracted its file system. I am auditing it for pre-authentication remote code execution (RCE) vulnerabilities.

To supercharge this process, actors leverage the "wooyun-legacy" GitHub repository—a Claude code skill plugin containing a distilled knowledge base of over 85,000 real-world vulnerability cases from the WooYun platform (2010-2016). This enables in-context learning for code analysis and logic flaw identification.

Automation is rampant. APT45 (DPRK-nexus) sent thousands of repetitive prompts to recursively analyze CVEs and validate proof-of-concept exploits. Agentic tools like OpenClaw and OneClaw are used with intentionally vulnerable testing environments to refine AI-generated payloads. UNC2814 (PRC-nexus) also employed expert persona prompting for embedded device targets.

Cyber Crime: The First AI-Born Zero-Day?

Perhaps the most alarming finding is GTIG's identification of a threat actor using a zero-day exploit believed to have been developed with AI. The exploit targeted a popular open-source web-based system administration tool, aiming for mass exploitation—potentially thwarted by GTIG's proactive counter-discovery.

The Python script bypassed 2FA via a semantic logic flaw with a hardcoded trust assumption. Its characteristics strongly indicated AI generation: abundant educational docstrings, a hallucinated CVSS score, and a structured textbook Pythonic format with detailed help menus and a clean _C ANSI color class. GTIG coordinated responsible disclosure with the impacted vendor.

This case highlights a key LLM strength: identifying high-level semantic flaws and contextual reasoning on authorization logic—areas where traditional fuzzers and static analysis tools optimized for sinks and crashes fall short.

AI-Augmented Obfuscation and Evasion

Adversaries are also using LLMs to create malware that actively evades detection. GTIG documented several malware families with LLM-enabled capabilities:

Malware	Evasion/Obfuscation Type	Details
PROMPTFLUX	Dynamic Modification	Experimented with Gemini API for code generation
HONESTCUE	Evasion Payload Generation	Interacts with Gemini API for VBScript obfuscation; just-in-time self-modification to evade static signatures
CANFAIL	Decoy Logic	Russia-nexus, targeting Ukrainian organizations; LLM-generated decoy code with developer comments describing unused blocks as filler
LONGSTREAM	Decoy Logic	Russia-nexus, targeting Ukrainian organizations; large volume of decoy logic (e.g., 32 instances querying daylight saving status) to appear benign

Beyond malware, infrastructure tools are also being AI-augmented. APT27 (PRC-nexus) used Gemini to develop a fleet management application for an operational relay box (ORB) network, with a hardcoded maxHops parameter set to 3 hops and support for MOBILE_WIFI and ROUTER device types to route traffic through residential IP addresses via 4G/5G SIM cards.

Autonomous Malware Operations: PROMPTSPY

One of the most sophisticated examples is PROMPTSPY, an Android backdoor first identified by ESET that uses the Google Gemini API for autonomous operations. Its GeminiAutomationAgent module contains a hardcoded prompt assigning a benign persona and analyzes UI geometry via spatial mathematics. The module serializes the UI hierarchy to XML via the Accessibility API, sends it to the gemini-2.5-flash-lite model in JSON Mode, and parses the response for actions like CLICK or SWIPE.

Key capabilities include:

Biometric capture: Replays PINs and lock patterns for re-access.
Persistence: An AppProtectionDetector overlay shields the Uninstall button; Firebase Cloud Messaging (FCM) relaunches the app if inactive.
Operational resilience: Dynamic updates to C2 infrastructure, Gemini API keys, and VNC relay servers via C2 channel.

Google has disabled associated assets, confirmed no PROMPTSPY apps on Google Play, and relies on Google Play Protect (default on Android with Google Play Services) to block such threats.

AI-Augmented Research, Reconnaissance, and Information Operations

LLMs are also being used to streamline reconnaissance and attack lifecycle support. Actors generate organizational hierarchies, third-party relationships for high-fidelity phishing, and even identify hardware from photos. Agentic workflows—such as PRC-nexus actors using Hexstrike (with Graphiti memory system) and Strix against a Japanese tech firm—enable autonomous pivoting with tools like subfinder and httpx.

In the information operations (IO) domain, actors from Russia, Iran, China, and Saudi Arabia are researching AI for content creation, localization, and narrative audio generation. While no AI-generated content has been found in the wild yet, tooling advances are clear. "Operation Overload," a pro-Russia campaign, used suspected AI voice cloning to impersonate journalists, splicing authentic video with fabricated audio.

Obfuscated and Scalable LLM Access

To mask their activities, threat actors have evolved from simple API calls to sophisticated middleware and proxy relays. Automated registration scripts, anti-detect browsers, and account-pooling are now common. For example, UNC6201 (PRC-nexus) created a GitHub Python script for automated premium LLM account registration and cancellation, complete with CAPTCHA bypass and SMS verification.

GTIG catalogued a range of tool types used by actors like UNC5673 (PRC-nexus, overlaps TEMP.Hex) targeting South and Southeast Asian governments:

Tool Type	Function	Examples
API Gateways & Aggregators	Consolidate API keys into OpenAI-compatible endpoint; resell access, mask traffic	CLIProxyAPI, Claude Relay Service, CLIProxyAPIPlus, OmniRoute
LLM Account Provisioning	Automate account creation/verification; Sybil attacks for free-tier credits	ChatGPT Account Auto-Registration Tool, AWS-Builder-ID
Client Interfaces	User-friendly LLM interaction; manage proxies/multi-accounts	Cherry Studio, EasyCLI, Kelivo
Infrastructure Management	Control API proxies, logging, quotas; C2 for scaled access	CLIProxyAPI ManagementCenter
Anti-Detection & Masking	Isolate fingerprints; evade bot detection	Roxy Browser

Mitigation advice from GTIG: analyze network data for API aggregator patterns to detect and block such infrastructure.

AI as a Target: Supply Chain Attacks

The report also warns of AI environments themselves becoming targets. Per Google's Secure AI Framework (SAIF), risks include Insecure Integrated Components (IIC) and Rogue Actions (RA). Attackers compromise AI software dependencies for initial access, then pivot to networks for ransomware and extortion. Notable incidents include weaponized OpenClaw skill packages (February 2026) with hidden code execution, and compromised code packages by "TeamPCP" (UNC6780) targeting GitHub repos and Actions (Trivy, Checkmarx, LiteLLM, BerriAI) via malicious PyPI packages and PRs embedding the SANDCLOCK credential stealer.

As AI capabilities continue to democratize offensive power, the GTIG report serves as a critical wake-up call: defenders must now assume that adversaries have access to the same—or better—AI tools, and adapt their strategies accordingly.