Blog

R&D writeups from the Vigil Guard engineering team: architectural decisions, calibration methodology, attack-vector analysis, and the failure modes we hit while running AIDR (AI Detection & Response) in production.

· by Tomasz Bartel

Vigil Guard 1.8.x across nine public prompt-injection benchmarks

We are wrapping up testing of 1.8.x. We ran it against nine public, external prompt-injection benchmarks, with no internal dataset of our own. On indirect and RAG attacks recall runs 98.9% to 99.2%, on JailbreakBench 100%, at 0.0% false positives on deepset and 2.6% over-defense on NotInject. Every result reproduces from its linked source.

LLM SecurityPrompt InjectionBenchmarksRAG SecurityAIDR
Read more
· by Tomasz Bartel

vge-promptguard-v2h: the end of fine-tuning for production guardrails

After six weeks of trying every standard catastrophic-forgetting technique on our production prompt injection detector, we abandoned fine-tuning entirely. v2h ships two specialized models and a deterministic router instead. No regression on the old distribution, full coverage of the new one.

LLM SecurityPrompt InjectionGuardrailsCatastrophic ForgettingAIDR
Read more