Attack categories
Trident’s attack library covers 200+ distinct vectors across 10 categories. Each category maps directly to an OWASP Agentic Top-10 risk.| Category | Description |
|---|---|
| Prompt injection | Direct instructions embedded in user input that attempt to override the agent’s system prompt |
| Jailbreaks | 16+ patterns that attempt to disable safety guardrails through persona hijacking, hypothetical framing, and authority spoofing |
| Encoding bypass | 15+ obfuscation techniques — Base64, ROT13, Unicode homoglyphs, zero-width characters — designed to evade keyword-based filters |
| RAG poisoning | Adversarial documents injected into retrieval corpora to redirect the agent’s reasoning at query time |
| Multi-turn social engineering | Gradual escalation sequences that build rapport and authority across many turns before attempting exploitation |
| Tool-call hijack | Payloads that coerce the agent into calling a tool with attacker-controlled arguments |
| MCP exploitation | Attacks targeting Model Context Protocol tool descriptions and server responses |
| Sandbox escape via tools | Attempts to execute unintended system commands or access out-of-scope resources through tool interfaces |
| Indirect prompt injection | Instructions embedded in retrieved content (web pages, documents, email bodies, database rows) that the agent processes as trusted data |
| Resource exhaustion | Inputs designed to cause excessive token consumption, infinite loops, or cost amplification |
Run a campaign from the dashboard
Open the Red-Team tab
Navigate to the Red-Team tab in the Trident dashboard.
Select your agent
Choose the agent you want to test from the agent selector. If you have already called
trident.init() with an agentId, the agent appears in the list automatically.Configure the attack scope
Choose a scan mode:
- Quick — a targeted subset of high-signal attack vectors, suitable for fast feedback during development
- Standard — broad coverage across all 10 categories, recommended for pre-deployment gates
- Deep — exhaustive multi-phase assessment with up to 300 turns per attack class
- Exhaustive — every skill in the library, audit-grade coverage for quarterly reviews
Trigger a campaign via API
You can enqueue a red-team campaign programmatically using the REST API. This is the recommended approach for CI/CD pipelines. Start a campaign:Poll campaign status
Use thejobId returned by the POST endpoint to check the campaign’s progress:
state field progresses through queued → active → completed (or failed). Once the state is completed, query the findingsUrl to retrieve all findings from the run.
CI/CD integration
You can gate deployments on red-team results by calling the campaign API in your pipeline and polling until the run completes. For a full example with GitHub Actions and a pass/fail threshold, see the CI/CD integration guide.Campaign results
All findings from a red-team campaign appear in the Findings inbox with:- Attack transcript — the full multi-turn conversation between the Trident attacker and your agent
- Severity — AIVSS score mapped to
LOW,MEDIUM,HIGH, orCRITICAL - OWASP category — the Agentic Top-10 code (e.g.
LLM01,LLM06) that the finding maps to - Skill ID — the specific attack vector that triggered the finding
Orchestrators and judging
Trident uses three orchestration strategies depending on the attack category:- Crescendo — gradually escalates prompts across multiple turns, building context and authority before attempting exploitation. Effective for social engineering and multi-turn attacks.
- Converter pipeline — applies deterministic encoding and obfuscation transforms to payloads. Covers the full encoding bypass category.
- Multi-agent campaign — coordinates independent attacker agents working different angles simultaneously.
- Deterministic judge — regex and keyword rules that flag known-bad patterns with high precision
- Tool-oracle judge — inspects actual tool call arguments to detect hijacks that a text-only judge would miss
- LLM judge — a language model that evaluates whether the agent’s response constitutes a successful exploitation