Review and Manage Security Findings in Trident

A finding is a discrete security issue that Trident has detected and wants your attention on. Findings flow in from every detection surface — adversarial red-team campaigns, the runtime firewall, static analysis of your agent code, threshold-based monitors, your agent’s own self-reports, and cloud security scans. Every finding is scored with an OWASP Agentic Top-10 category and an AIVSS severity so you always know what you’re looking at and how urgent it is.

Finding sources

Source	Description
`REDTEAM`	Produced by a red-team campaign. Includes the full attack transcript and the specific skill that triggered the finding.
`FIREWALL`	A prompt or output was blocked by the runtime firewall. Includes the scanner verdict and matched rule.
`SAST`	Static analysis of your agent’s source code — hardcoded secrets, insecure tool configurations, vulnerable dependency patterns.
`MONITOR`	A metric or signal threshold was breached — for example, an unusual spike in refusals, token usage, or tool error rates.
`SELF_REPORT`	Your agent proactively reported an anomaly using `trident.selfReport()`.
`CSPM`	Cloud Security Posture Management scan — misconfigured IAM roles, public storage buckets, and other cloud resource misconfigurations.
`KSPM`	Kubernetes Security Posture Management scan — insecure pod specs, overly permissive RBAC, and cluster-level risks.
`IAC`	Infrastructure-as-code scan — security issues found in Terraform, CloudFormation, or similar IaC files before deployment.
`SECRET`	Secret detection scan — API keys, credentials, or tokens found in source code, environment configs, or deployment artifacts.
`VULN`	Dependency vulnerability scan — known CVEs in packages your agent depends on.
`RUNTIME`	Runtime security event — anomalous process or network behaviour detected in your agent’s execution environment.
`MCP`	MCPSafetyScanner audit of a Model Context Protocol server’s tool descriptions and permissions.

Severity scoring

Each finding carries two severity signals: OWASP Agentic Top-10 category — maps the finding to the relevant risk in the OWASP Top 10 for LLM Applications and Agentic AI. The category tells you what class of risk the finding represents (e.g. LLM01: Prompt Injection, LLM06: Excessive Agency). AIVSS score — the AI Vulnerability Scoring System is an extension of CVSS adapted for AI-specific attack characteristics. It accounts for factors like exploitability in agentic contexts, blast radius across connected tools, and data sensitivity. The score (0.0–10.0) is mapped to a severity label:

Label	AIVSS range
`LOW`	0.1–3.9
`MEDIUM`	4.0–6.9
`HIGH`	7.0–8.9
`CRITICAL`	9.0–10.0

Finding lifecycle

Findings move through a defined set of states:

OPEN → ACKNOWLEDGED → IN_PROGRESS → RESOLVED
                                  ↘ WONT_FIX
                                  ↘ DUPLICATE

Open — new finding, not yet reviewed
Acknowledged — you have seen it and it is on the radar
In progress — actively being remediated
Resolved — fixed; the finding auto-resolves after N consecutive passing replays against your live agent
Won’t fix — accepted risk; the finding is dismissed without remediation
Duplicate — grouped with an existing incident

Finding replay

Every finding stores the original attack payload or triggering input. The Replay button re-runs that exact input against your current agent endpoint. Trident uses the same 3-judge ensemble as the red-team engine to evaluate the result. If the replay passes (the agent handles it safely) N consecutive times, Trident automatically marks the finding as Resolved. This gives you an objective, automated signal that your fix actually worked — not just that you think it did. To replay a finding manually, open it in the Findings inbox and click Replay. To configure the auto-resolve threshold (default: 3 consecutive passes), go to Project Settings → Findings.

Agent self-reporting

Your agent has more context about what just went wrong than any external monitor can have. The trident.selfReport() method lets your agent proactively ship an observation to Trident as a first-class SELF_REPORT finding, with full access to the same downstream routing — Slack alerts, GitHub issues, Jira tickets, webhooks.

import { trident } from "@vouch-ai/sdk";

// Report an unexpected external tool call your agent detected at runtime.
await trident.selfReport({
  agentId: "prod-rag-bot",
  kind: "data-exfil-suspected",
  message: "Unexpected tool call to external URL detected in agent response",
  severity: "HIGH",
  metadata: {
    url: "https://attacker.example.com/collect",
    tool: "web_search",
    input: "the original user message that triggered this",
    output: "the agent response that contained the suspicious URL",
  },
});

The kind field is a kebab-case slug that becomes the finding’s category. Trident’s Sentinel triage agent recognises these standard kinds and surfaces richer evidence panels for them:

Kind	Meaning
`tool-call-failure`	A tool returned an error or timed out
`silent-tool-failure`	A tool returned empty/nil and the agent continued anyway
`refusal-spike`	The agent refused a request it normally handles
`hallucinated-policy`	The agent cited a policy or fact that doesn’t exist
`context-loss`	The agent forgot something from earlier in the conversation
`infinite-loop`	The agent repeated the same action N times
`data-exfil-suspected`	Outbound traffic to an unexpected destination
`prompt-injection`	Your own guardrail caught a prompt injection pattern

By default, selfReport() is fire-and-forget — it does not block your agent’s request path. Pass await: true if you need the promise to resolve before continuing.

Filtering and querying findings

The Findings inbox in the dashboard supports filters across every dimension:

Source — REDTEAM, FIREWALL, SAST, MONITOR, SELF_REPORT, CSPM, KSPM, IAC, SECRET, VULN, RUNTIME, MCP
Severity — LOW, MEDIUM, HIGH, CRITICAL
OWASP category — filter to a specific Agentic Top-10 code
Status — open, acknowledged, in progress, resolved
Agent — scope to a specific agent ID
Date range — last 7, 30, 90, or 180 days

Bulk actions

Select multiple findings in the inbox to apply actions in one step:

Confirm — mark all selected findings as confirmed true positives; each one generates a firewall deny rule within 5 minutes
Dismiss — mark as WONT_FIX or DUPLICATE
Replay — re-run all selected findings against your current agent

Retrieve findings via API

You can query your findings programmatically using the REST API.

curl "https://app.tryvouch.ai/api/public/trident/findings?source=REDTEAM&severity=HIGH&sinceDays=7" \
  -H "Authorization: Basic $(echo -n 'pk-...:sk-...' | base64)"

Response shape:

{
  "count": 2,
  "findings": [
    {
      "id": "find_01HX...",
      "agentId": "prod-rag-bot",
      "severity": "HIGH",
      "source": "REDTEAM",
      "category": "indirect_prompt_injection",
      "status": "OPEN",
      "title": "Indirect prompt injection via retrieved document",
      "traceId": "01HXZQ...",
      "redteamRunId": "a1b2c3d4-...",
      "certificateId": null,
      "createdAt": "2025-05-27T10:32:00.000Z",
      "riskScore": 8.4,
      "owaspCode": "LLM01"
    }
  ]
}

Query parameters:

Parameter	Type	Default	Description
`source`	string or array	—	Filter by source (e.g. `REDTEAM`, `FIREWALL`)
`severity`	string or array	—	Filter by severity level
`status`	string or array	—	Filter by lifecycle status
`agentId`	string	—	Scope to a specific agent
`redteamRunId`	string	—	Scope to a specific campaign run
`sinceDays`	integer	`30`	Lookback window, 1–180 days
`limit`	integer	`50`	Max results, 1–200

Incidents

Trident automatically groups related findings into incidents — similar to how Sentry groups error events by stack trace. When multiple findings share the same attack pattern, OWASP category, and affected agent, they are consolidated into a single incident so you triage one issue instead of dozens. The incident view shows the full timeline, the number of occurrences, and the affected findings in one place.

​Finding sources

​Severity scoring

​Finding lifecycle

​Finding replay

​Agent self-reporting

​Filtering and querying findings

​Bulk actions

​Retrieve findings via API

​Incidents