Open Source LLM Security TypeScript AI Defense ShieldX AI Security Prompt Injection Defense Self-Evolving Systems

ShieldX v0.5.0 — Self-Evolving LLM Prompt Injection Defense (Open Source)

Learn about ShieldX v0.5.0 - the cutting-edge open-source solution for defending against prompt injection attacks on large language models.

Rene Fichtmueller / 2026-04-07 / ~4 min read

Why ShieldX?

Existing prompt injection defense tools cover fragments of the problem. None combines self-learning pattern evolution, kill chain classification, MCP tool-call protection, adversarial training, and automatic self-healing into one coherent pipeline. ShieldX fills that gap — and it’s the only open-source tool that continuously evolves its own detection patterns without human intervention.

The Numbers (v0.5.0)

70.8% True Positive Rate across 12 attack corpus categories
0.0% False Positive Rate on production-representative benign inputs
369+ detection rules in 12 categories
90 MITRE ATLAS techniques mapped across 8 tactics
20+ languages covered (DE, FR, ES, RU, JA, KO, AR, PT, TR, TH, HI, IT, NL, PL, VI + homoglyphs + polyglot detection)
<50ms full pipeline latency without GPU-dependent layers

10-Layer Defense Pipeline

Every input passes through 10 sequential and parallel layers before reaching your LLM:

Layer 0 — Preprocessing: Unicode NFKC normalization, zero-width character removal, cipher decoding (ROT13, Base64, hex pairs, binary octets, leet speak, word reversal), tokenizer deobfuscation (I.g.n.o.r.e-style attacks, dash-split words), and compressed payload detection.

Layer 1 — Rule Engine: 369+ regex and structural patterns covering 12 categories: base injection (132 rules), jailbreak/persona hijacking (68 rules including 15+ named personas like DAN, AIM, KEVIN), MCP tool poisoning (36 rules), multilingual attacks (33 rules in 20+ languages), DNS covert channels (30 rules), persistence (26 rules), extraction (13 rules), delimiter injection (9 rules), exfiltration (8 rules), encoding bypass (7 rules), and authority claims (7 rules).

Layer 2 — ML Classifiers: Sentinel classifier and Constitutional AI classifier for semantic intent analysis.

Layers 3-5 — Advanced Scanners (parallel): Embedding similarity + anomaly detection via vector comparison, Shannon entropy analysis for detecting obfuscated payloads, and attention pattern analysis for structural anomalies.

Layer 6 — Behavioral Suite: Session profiling, intent drift tracking, context window integrity verification, memory integrity guard, Bayesian trust scoring per source, and decomposition attack detection (multi-step attacks split across benign-looking messages).

Layer 7 — MCP Guard: Tool call validation, MELON privilege escalation detection (based on ICML 2025 research), tool chain guard for suspicious sequences, resource governor for token/budget enforcement, and decision graph analysis.

Layers 8-9 — Sanitization & Validation: Injection marker stripping, credential redaction, output validation for system prompt leakage, script injection, canary token leaks, and PII exposure.

Post-Pipeline: Defense Ensemble + ATLAS Mapping

After all scanners complete, a 3-voter Defense Ensemble aggregates results through weighted majority voting:

Rule voter (weight 0.35): RuleEngine, YARA, entropy, canary, indirect injection scanners
Semantic voter (weight 0.30): Embedding similarity, embedding anomaly, sentinel, constitutional classifiers
Behavioral voter (weight 0.35): Session profiler, intent drift, context integrity, memory integrity, decomposition detector

The ensemble produces a final verdict (clean, suspicious, threat) with a confidence score. Unanimous agreement between all three voters boosts confidence. This prevents single-scanner false positives from triggering unnecessary blocks.

Every detection is then mapped to 90 MITRE ATLAS techniques across 8 tactics (Reconnaissance, ML Attack Staging, Initial Access, ML Model Access, Execution, Exfiltration, Evasion, Impact) for compliance reporting and threat intelligence.

Bio-Immune Self-Evolution

This is what truly sets ShieldX apart. The defense system is modeled on biological immune systems with six interconnected mechanisms:

1. Innate Immunity (Static Rules): 369+ built-in patterns provide the baseline detection floor. These are the first line of defense and never change at runtime.

2. Adaptive Immunity (ML + Ensemble): Classifiers learn from confirmed true/false positives via submitFeedback(). Active learning identifies uncertain samples at the decision boundary for human review.

3. Immune Memory (Vector Database): Every confirmed attack is stored as an embedding vector in PostgreSQL with pgvector. New inputs are compared via semantic similarity — catching paraphrased variants even when exact words differ. Patterns have configurable decay to prevent stale memory.

4. Evolution Engine: Runs on a configurable cycle (default: hourly). It probes for gaps by generating synthetic attacks, creates candidate rules for any bypasses, validates against a benign corpus to ensure the false positive rate stays below threshold, auto-deploys validated rules, and can automatically roll back rules that cause FPR increases.

5. Adversarial Training (IEEE S&P 2025 Minimax): Attacker phase generates increasingly sophisticated variants. Defender phase updates detection. Training continues until bypass rate falls below the target threshold.

6. Fever Response: When active attacks are detected, the system dynamically lowers detection thresholds, tightens rate limits, and activates additional scanners — then gradually returns to normal as attack activity subsides.

Plus an Over-Defense Calibrator that periodically tests the pipeline against known-benign inputs and adjusts per-scanner thresholds to minimize false positive rates.

Kill Chain Mapping + Self-Healing

Every detected attack is mapped to the 7-phase Promptware Kill Chain (Schneier et al. 2026), and ShieldX applies phase-appropriate healing automatically:

Initial Access → Sanitize (strip injection, pass clean input)
Privilege Escalation → Block (reject input, log incident)
Reconnaissance → Block (suppress output, inject decoy)
Persistence → Reset (restore session checkpoint, clear poisoned context)
Command & Control → Incident (alert, quarantine session)
Lateral Movement → Incident (halt tool execution, revoke permissions)
Actions on Objective → Incident (full session termination, compliance report)

What No Other Open-Source Tool Does

Self-evolving pattern generation (EvolutionEngine)
Bio-immune memory with vector recall (ImmuneMemory + pgvector)
Adversarial minimax training (AdversarialTrainer)
Adaptive fever response during active campaigns
3-voter defense ensemble with weighted majority
MELON privilege escalation detection for MCP tool chains
Decomposition attack detection across conversation turns
Multi-layer deobfuscation (7 encoding types)
Supply chain integrity verification for ML models
Full compliance stack: MITRE ATLAS (90 techniques) + OWASP LLM Top 10 + EU AI Act
20+ language multilingual detection with homoglyph and polyglot awareness

Zero Cloud. Zero Data Leakage.

ShieldX is local-first. Everything runs on your infrastructure. No API calls to external services for core detection. When federated community sync is enabled, only SHA-256 pattern hashes are shared — never raw input, session data, or system prompts.

Built With

TypeScript, Node.js 20+, PostgreSQL 17 + pgvector, Vitest, tsup. Integrates with Next.js 15, Ollama, Anthropic Claude, and any LLM provider. Apache 2.0 licensed.

Based on 22 research papers including work from CMU, UC Berkeley, Microsoft, Meta, Anthropic, MITRE, OWASP, and the European Parliament.

Get Started

npm install @shieldx/core

Full documentation, architecture diagrams, configuration reference, and integration guides: github.com/renefichtmueller/ShieldX

If you’re building LLM-powered applications and security isn’t an afterthought — this is for you.