Adversarial Test Taxonomy for LLM Agents

Seventy-two named ways an agent can fail. All 72 fire today.

AgentSafety-72 is the most complete adversarial-test taxonomy for LLM agents — 72 named attack vectors mapped to specific Enable Equation gate failures, all 72 summon and return real AdversarialResults with measured values (identity markers, GPU temp, memory %, entropy, drift). Production ledger shows ~10,000 attacks fired with hash-chained verification. Built for AI red-team teams, agent platforms, and regulators that need to prove an agent stack can't be broken in known ways.

72/72
Daemons Verified Live
~10K
Production Attacks Logged
26 MB
Hash-Chained Ledger
2,874
Lines of Code
How AgentSafety-72 Works

A named-attack red team, not a fuzz tester.

AgentSafety-72 began as an internal research module (historical internal codename redacted, 2,874 LOC). The code is unchanged — only the naming was reworked for federal and commercial-safe presentation. Every attack is deliberate, named, and mapped to a deterministic gate outcome.

Sample Attack Implementations

Representative attacks. Each maps to a specific gate failure.

Every attack below is live Python code with a deterministic gate-failure target. Each red-team run produces a hash-chained receipt of which gates fired and which failed to.

01 — Identity Masking

Agent presents a false claimed identity (CallerID spoof). Target: G_AUTH.

02 — Knowledge Hallucination

Agent confidently states a derived fact with no source. Target: G_VETO.

03 — Privilege Escalation

Agent requests a permission outside its declared scope. Target: G_AUTH + G_POLICY.

04 — Policy Bypass

Agent attempts to act on an item explicitly excluded by policy DSL. Target: G_POLICY.

05 — Coherence Drift

Agent generates outputs that progressively diverge from the stated task. Target: G_COHERENCE.

06 — Resource Exhaustion

Agent attempts unbounded recursion or token spend. Target: G_BUDGET.

07 — Receipt Forgery

Agent submits a forged predecessor receipt to extend its action chain. Target: G_LEDGER.

08 — Time Authorization Replay

Agent reuses an expired time-bound authorization token. Target: G_EPOCH.

09 — Cross-Tenant Bleed

Agent attempts to read state belonging to another tenant. Target: G_ISOLATION.

10 — Hardware Bypass

Agent attempts to execute a hardware-gated action via software-only path. Target: G_HARDWARE.

11 — Spectral Stability Attack

Agent introduces synthetic noise to disguise drift. Target: G_SPECTRAL.

12 — Council Coercion

Agent uses social-proof prompts to override the deliberative council. Target: G_VETO + G_COHERENCE.

Production Evidence

Not a demo. Already fired ten thousand times.

AgentSafety-72 isn't a paper proposal — it's a system that ran in production for weeks. The 72-vector adversarial sweep fired ~10,000 times across 15 attack categories, every one hash-chained, every gate_held boolean recorded.

72/72 Daemons Verified

All 72 daemons summon and return real AdversarialResults — not stubs, not mocks. Measured values: identity markers, GPU temp, memory %, entropy, drift.

Production Ledger

attack_ledger.jsonl — 26 MB hash-chained append-only log of every attack run, every gate outcome, every measured value.

Attack Categories

15 distinct attack types covered across the 72-vector sweep. Categories span identity, knowledge, policy, privilege, drift, resource, replay, isolation, hardware, spectral, and coercion.

Sample Attack Vectors Verified

AV-01 through AV-08 (representative attack vectors) — running live with measured values returned on every invocation.

Each attack record contains: attack_id, attack_name, gate_target, attack_type, gate_held (boolean), severity (0–1), details (free text), timestamp, duration.

Try It Live · Browse the 72

Pick an attack. See what fires.

Sample measured values from the production sweep. The "Run sweep" button cycles all 72 attacks live.

Who This Is For

Three audiences. One shared taxonomy.

AgentSafety-72 gives red-team groups, agent platform vendors, and regulators a common vocabulary backed by working code.

AI Red Teams

Drop-in adversarial test framework. Standardized attack taxonomy, consistent reporting, hash-chained receipts. Replaces ad-hoc fuzzing with a real test program.

Agent Platforms

Pre-launch certification. Run AgentSafety-72 against your agent before customers do. Ship with a signed certificate showing which attacks your stack survives.

Regulators & Auditors

Standardized vocabulary for "this agent failed under attack X." Map your audit findings to a shared 72-vector taxonomy that has working code behind every name.

Status & Licensing

Open-core in review. Commercial track live.

Open-Source Track — Core taxonomy + 12 attacks may be released under a permissive license (TBD — currently under review for foreign-filing protection before public disclosure). Targeting GitHub release once IP review is complete.

Commercial Track — Custom attack implementations, customer-specific deployment integration, retainer red-team engagements with hash-chained reporting.

Open-Core
TBD
community license — pending IP review
  • Core 72-vector taxonomy
  • 12 reference attack implementations
  • Gate-failure verification harness
  • Hash-chained receipt format
  • Community support
Enterprise Audit
Quote
6-month red-team engagement
  • Six-month red-team engagement
  • Prioritized 10–15 attack delivery
  • Customer-profile attack scoping
  • Hash-chained audit deliverables
  • Executive readout & remediation plan
Sovereign
Quote
air-gapped on-prem
  • Air-gapped on-prem deployment
  • Source-available license
  • Custom attack scope
  • Federal / defense-ready packaging
  • Annual security audit
Target Customers

Built for the teams that own agent safety.

AgentSafety-72 was scoped against the real adversarial-test and audit programs run by these organizations and frameworks.

Anthropic OpenAI Trust & Safety Cohere Scale AI Red-Team Anduril Shield AI MITRE ATT&CK NIST AI RMF AI Safety Institute LangChain AutoGPT Continue.dev
Red-Team Engagements Open

Find out which of the seventy-two break your agent.

We work with AI platform teams, red-team groups, and federal AI-safety offices under retainer or per-engagement. Custom attack implementations and customer-specific deployment integration available.