Blog

R&D writeups from the Vigil Guard engineering team: architectural decisions, calibration methodology, attack-vector analysis, and the failure modes we hit while running AIDR (AI Detection & Response) in production.

Published June 15, 2026· by Tomasz Bartel

Vigil Guard 1.8.x across nine public prompt-injection benchmarks

We are wrapping up testing of 1.8.x. We ran it against nine public, external prompt-injection benchmarks, with no internal dataset of our own. On indirect and RAG attacks recall runs 98.9% to 99.2%, on JailbreakBench 100%, at 0.0% false positives on deepset and 2.6% over-defense on NotInject. Every result reproduces from its linked source.

LLM SecurityPrompt InjectionBenchmarksRAG SecurityAIDR

Published May 3, 2026· by Tomasz Bartel

vge-promptguard-v2h: the end of fine-tuning for production guardrails

After six weeks of trying every standard catastrophic-forgetting technique on our production prompt injection detector, we abandoned fine-tuning entirely. v2h ships two specialized models and a deterministic router instead. No regression on the old distribution, full coverage of the new one.

LLM SecurityPrompt InjectionGuardrailsCatastrophic ForgettingAIDR

Published April 29, 2026· by Tomasz Bartel

Semantic Drift Analysis as a Supporting Mechanism for LLM Security in the Context of Prompt Injection

How semantic drift analysis complements traditional prompt injection detection in LLM-based systems. When a model's output deviates from its defined role and constraints, the deviation itself becomes a security signal.

LLM SecurityPrompt InjectionSemantic DriftAgent SecurityAIDR