Original Research

Prompt Security Patterns Ranked by Security Score

32 system prompt defensive patterns evaluated against 7 injection attack categories. Each pattern includes the actual prompt text, security score, coverage map, and documented weaknesses.

By Michael Lip · April 7, 2026 · Test your prompt with LochBot

🛡 This research analyzes defensive prompt patterns at the structural level. Real-world effectiveness depends on the specific model, deployment context, and attack sophistication. Use LochBot's scanner to test your own system prompt against 31 attack patterns.
# Pattern Name Score Techniques Coverage (D/I/R/E/L/C/M) Details

Methodology

Each pattern was evaluated against 7 attack categories defined by the OWASP LLM Top 10 (2025) and academic prompt injection research from Perez & Ribeiro (2022), Greshake et al. (2023), and Liu et al. (2024). Scoring criteria:

  1. Coverage breadth (0-30 points) — How many of the 7 attack categories does the pattern address?
  2. Defense depth (0-25 points) — Does the pattern use multiple defensive layers per category?
  3. Specificity (0-20 points) — Are defenses concrete (naming specific attacks) or vague ("be safe")?
  4. Structural integrity (0-15 points) — Are delimiters, formatting, and instruction hierarchy well-structured?
  5. Robustness to variation (0-10 points) — Does the pattern handle paraphrased/translated/encoded versions of attacks?

Attack Categories

Limitations

These scores reflect structural analysis of the prompt text. Actual effectiveness depends on the specific LLM, its training, RLHF alignment, and the sophistication of the attacker. A structurally sound prompt can still fail against a poorly aligned model, and a minimal prompt may work fine with a well-aligned one. This research is a complement to, not a replacement for, red-team testing against your deployed model.

Frequently Asked Questions

What is a prompt security pattern?
A prompt security pattern is a specific defensive technique embedded in a system prompt to protect an LLM-powered application against injection attacks. Patterns include XML delimiters, role reinforcement, explicit ban lists, few-shot refusal examples, and input sanitization instructions. Each pattern defends against one or more of the 7 major prompt injection attack categories.
How are the security scores calculated?
Security scores are calculated across five dimensions: coverage breadth (how many attack categories are addressed), defense depth (multiple layers per category), specificity (concrete vs. vague defenses), structural integrity (delimiters, formatting), and robustness to variation (handling paraphrased or encoded attacks). Each dimension contributes to the 0-100 total score.
Which single pattern is most effective?
No single pattern provides complete protection. The highest-scoring individual patterns combine multiple techniques: XML-delimited instructions with few-shot refusal examples, explicit ban lists, and immutability declarations. The Layered Defense Fortress pattern scores 92/100 by combining 6 techniques, but even it has weaknesses against novel context overflow variations.
Do these patterns work with all LLMs?
Pattern effectiveness varies by model. Instruction-tuned models like GPT-4, Claude, and Gemini respond well to explicit defensive instructions. Smaller or less-aligned models may ignore even well-structured patterns. Few-shot refusal examples are the most model-agnostic technique because they leverage in-context learning rather than pure instruction following.
How do I test if my pattern works?
Use LochBot's free scanner to analyze your system prompt against 31 attack patterns for structural coverage. For behavioral testing, run actual attack prompts from each category against your deployed model. Combine structural analysis with red-team testing for comprehensive coverage. The OWASP LLM Top 10 provides a framework for systematic testing.
What is the difference between direct and indirect injection?
Direct injection occurs when a user explicitly tells the model to ignore its instructions. Indirect injection occurs when malicious instructions are embedded in external data the model processes, such as documents, web pages, or database results. Indirect injection is harder to defend against because the attack surface is the data pipeline, not the user input itself.
Should I use XML delimiters or markdown delimiters?
XML delimiters with unique, non-guessable tag names are more secure than markdown delimiters like triple backticks or horizontal rules. Markdown delimiters appear frequently in training data, making them easier for attackers to guess and escape. Custom XML tags like <x7k_system_instructions> create a boundary that attackers cannot predict or replicate.

📥 Download Raw Data

Free to use under CC BY 4.0 license. Cite this page when sharing.