The Trident firewall intercepts prompts before they reach your LLM and scans outputs before they leave your agent — blocking prompt injection, jailbreaks, canary leaks, and other attacks in real time. It runs a two-stage decision: your project’s custom deny rules fire first (populated automatically from confirmed findings), and then the LLM Guard ensemble takes over for anything that gets through. You have two ways to integrate it: route traffic through the gateway proxy (zero code change) or call trident.scan() directly in your agent code.
How the two-stage scan works
-
Stage 1 — Tenant deny rules: Trident checks the prompt against your project’s custom rule bank. This bank is automatically populated when you confirm a finding in the dashboard — confirmed attacks become deny rules within 5 minutes. This stage is fast (pure regex/substring matching, no network hop) and blocks known-bad patterns your agent has already encountered.
-
Stage 2 — LLM Guard firewall: If no tenant rule fires, the prompt is forwarded to the LLM Guard ensemble for deeper analysis. This stage runs the full scanner suite and returns a per-scanner verdict.
A blocked prompt returns is_valid: false with the matched rule or scanner result included in the response so you can surface a meaningful error to the user.
Integration option 1 — Gateway proxy (recommended)
The gateway proxy is the fastest way to add firewall coverage. Change your LLM client’s baseURL to the Trident gateway endpoint. Every request your agent makes to OpenAI or Anthropic is automatically scanned before being forwarded upstream.
TypeScript — OpenAI
TypeScript — Anthropic
Python — OpenAI
Python — Anthropic
import OpenAI from "openai";
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
baseURL: "https://app.tryvouch.ai/api/public/gateway/openai/v1",
defaultHeaders: {
"Authorization": `Basic ${Buffer.from(
`${process.env.TRIDENT_PROJECT_PUBLIC_KEY}:${process.env.TRIDENT_PROJECT_SECRET_KEY}`
).toString("base64")}`,
},
});
// All chat completions are now scanned automatically.
const response = await openai.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: userMessage }],
});
import Anthropic from "@anthropic-ai/sdk";
const anthropic = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY,
baseURL: "https://app.tryvouch.ai/api/public/gateway/anthropic/v1",
defaultHeaders: {
"Authorization": `Basic ${Buffer.from(
`${process.env.TRIDENT_PROJECT_PUBLIC_KEY}:${process.env.TRIDENT_PROJECT_SECRET_KEY}`
).toString("base64")}`,
},
});
// All messages are now scanned automatically.
const response = await anthropic.messages.create({
model: "claude-3-5-sonnet-20241022",
max_tokens: 1024,
messages: [{ role: "user", content: userMessage }],
});
from openai import OpenAI
import base64
import os
credentials = base64.b64encode(
f"{os.environ['TRIDENT_PROJECT_PUBLIC_KEY']}:{os.environ['TRIDENT_PROJECT_SECRET_KEY']}".encode()
).decode()
client = OpenAI(
api_key=os.environ["OPENAI_API_KEY"],
base_url="https://app.tryvouch.ai/api/public/gateway/openai/v1",
default_headers={"Authorization": f"Basic {credentials}"},
)
# All chat completions are now scanned automatically.
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": user_message}],
)
import anthropic
import base64
import os
credentials = base64.b64encode(
f"{os.environ['TRIDENT_PROJECT_PUBLIC_KEY']}:{os.environ['TRIDENT_PROJECT_SECRET_KEY']}".encode()
).decode()
client = anthropic.Anthropic(
api_key=os.environ["ANTHROPIC_API_KEY"],
base_url="https://app.tryvouch.ai/api/public/gateway/anthropic/v1",
default_headers={"Authorization": f"Basic {credentials}"},
)
# All messages are now scanned automatically.
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
messages=[{"role": "user", "content": user_message}],
)
Blocked requests return HTTP 400 with a JSON body explaining why the prompt was rejected:
{
"error": "blocked",
"is_valid": false,
"source": "trident.tenantRule",
"matched_rule": {
"id": "rule_01HX...",
"label": "Indirect injection via document",
"kind": "substring",
"scope": "project",
"snippet": "ignore previous instructions",
"severity": "HIGH"
}
}
Integration option 2 — Direct scan
Call trident.scan() before passing a prompt to your LLM. Use this when you need fine-grained control over which inputs are scanned, or when the gateway proxy is not suitable for your architecture.
import { trident } from "@vouch-ai/sdk";
// Initialize once at startup (see Tracing docs).
trident.init({ projectPk: "pk-...", projectSk: "sk-..." });
async function handleUserMessage(userMessage: string) {
const verdict = await trident.scan({
prompt: userMessage,
agentId: "prod-rag-bot", // optional
});
if (!verdict.ok || !verdict.is_valid) {
return "I'm unable to process that request.";
}
// Safe to proceed — call your LLM here.
return await callLLM(userMessage);
}
import vouch_sdk
import os
import urllib.request
import json
import base64
def scan_prompt(prompt: str) -> bool:
"""Returns True if the prompt is safe, False if it should be blocked."""
pk = os.environ["TRIDENT_PROJECT_PUBLIC_KEY"]
sk = os.environ["TRIDENT_PROJECT_SECRET_KEY"]
auth = base64.b64encode(f"{pk}:{sk}".encode()).decode()
payload = json.dumps({"prompt": prompt}).encode()
req = urllib.request.Request(
"https://app.tryvouch.ai/api/public/trident/scan",
data=payload,
method="POST",
headers={
"Authorization": f"Basic {auth}",
"Content-Type": "application/json",
},
)
with urllib.request.urlopen(req, timeout=8) as resp:
result = json.loads(resp.read())
return result.get("is_valid", True)
def handle_user_message(user_message: str) -> str:
if not scan_prompt(user_message):
return "I'm unable to process that request."
return call_llm(user_message)
The scan() call returns a result object with:
| Field | Description |
|---|
ok | true if the scan completed (even if the prompt was blocked); false on a network or auth error |
is_valid | true = safe to proceed, false = block the prompt |
source | Which stage made the decision: "trident.tenantRule" (a project-level deny rule matched), "trident.orgRule" (an organisation-wide policy matched), or "trident.firewall" (the LLM Guard ensemble ran) |
scanners | Per-scanner verdicts from LLM Guard (present when stage 2 ran) |
matched_rule | The deny rule that matched, if source is "trident.tenantRule" or "trident.orgRule" |
latencyMs | End-to-end scan latency |
What the firewall detects
Trident’s firewall runs a suite of specialised scanners on every prompt:
Structural prompt injection — detects instruction-shaped text masquerading as data, including fake system: / [INST] / <<SYS>> role headers smuggled into retrieved content, AgentDojo-style <INFORMATION> blocks containing imperatives, and precondition tricks like “before you can answer you must…”. These structural patterns survive re-wording and evade phrase-list filters.
Indirect injection in retrieved context — scans tool outputs, RAG documents, and other ingested content for injected instructions before the agent’s reasoning step processes them.
Canary token leaks in outputs — if you embed secret canary strings in your system prompt (e.g. TRIDENT-CANARY-7f3a), the firewall blocks any model response that echoes them. A leaked canary indicates either system-prompt exfiltration or a successful injection that coerced the model to repeat hidden context.
Jailbreak patterns — LLM Guard’s prompt injection model (deberta-v3-base-prompt-injection-v2) catches known jailbreak families including DAN, role-play bypasses, and hypothetical framing.
Custom ban rules — your project’s deny bank, auto-populated from confirmed findings. When you confirm a finding in the dashboard, the attacker’s payload is added as a deny rule and takes effect within 5 minutes.
Confirmed findings → ban rules
Every time you confirm a finding in the Findings inbox, Trident extracts the attack payload and adds it to your project’s deny rule bank. The next scan call that matches the pattern is blocked at stage 1 — before the LLM Guard ensemble even runs. This creates a feedback loop: the more findings you confirm, the faster and more precise the firewall becomes for your specific agent.
Scan modes
| Mode | Latency | When to use |
|---|
| Fast | < 50 ms | High-throughput agents where latency budget is tight. Uses regex/keyword matching and structural pattern detection only. |
| Full | 150–300 ms | Production agents where security coverage matters more than raw speed. Runs the complete LLM Guard ensemble including the DeBERTa prompt injection model. |
The gateway proxy and trident.scan() both run full mode by default. Fast mode can be enabled at the project level from the dashboard’s Firewall settings.
Viewing firewall events
Open the Firewall tab in the Trident dashboard to see a real-time log of all scanned requests. You can filter by:
- Verdict — show only blocked requests, or all requests
- Scanner — drill into which scanner triggered a block
- Agent — scope the view to a specific agent
- Time range — narrow to an incident window
Each firewall event links to the corresponding trace (if tracing is enabled) so you can see the full context around a blocked prompt.
The gateway proxy routes your LLM API calls through Trident’s infrastructure before forwarding them to OpenAI or Anthropic. Review your data residency and compliance requirements before enabling it. If you cannot route traffic through a third-party proxy, use the direct scan integration instead.