Original Research

LLM Jailbreak Techniques Timeline — Known Attacks from 2023 to 2026

Name: LLM Jailbreak Techniques Timeline 2023-2026
Creator: Michael Lip
Published: 2026-04-11
License: https://creativecommons.org/licenses/by/4.0/

Comprehensive timeline of 28 documented LLM jailbreak techniques with severity ratings, patch status, and defense strategies. Sourced from GitHub repositories (18,000+ stars tracked), academic papers, and security advisories.

By Michael Lip · April 11, 2026 · Test your prompt with LochBot

Methodology

Jailbreak techniques were catalogued from GitHub repositories (queried via the GitHub Search API on April 11, 2026 — 30 repos, 18,000+ combined stars), academic papers from ACL, USENIX Security, AAAI, NAACL, and CCS proceedings, Stack Overflow discussions, and security advisories from OpenAI, Anthropic, and Google. Severity is rated Critical/High/Medium/Low based on potential harm, reproducibility, and scope of affected models. Status reflects the state as of April 2026 across major frontier models.

Date	Technique	Category	Severity	Status	Description	Source
2023-02	DAN (Do Anything Now)	Role-Playing	High	Patched	Persona-based jailbreak convincing ChatGPT to adopt an unrestricted alter ego. Over 15 iterations (DAN 2.0-15.0) as patches were applied.	Reddit, GitHub (L1B3RT4S, 18K stars)
2023-03	Developer Mode Simulation	Role-Playing	High	Patched	Prompt claiming to enable "developer mode" or "debug mode" to bypass safety filters by simulating internal access.	Reddit, ChatGPT community
2023-04	Base64 Encoding	Encoding	Critical	Partial	Encoding malicious instructions in Base64 to bypass text-pattern safety filters. Model decodes and follows the hidden instructions.	GitHub (Awesome_GPT_Super_Prompting, 3.8K stars)
2023-05	Translation Attack	Encoding	Medium	Partial	Requesting harmful content in low-resource languages where safety training is weaker, then translating the output.	Academic research (Deng et al., 2023)
2023-06	Prompt Leaking / Extraction	Information Disclosure	High	Partial	Asking the model to repeat, summarize, or encode its system prompt to extract proprietary instructions.	GitHub (System-Prompt-Open, 29 stars)
2023-07	Context Overflow / Padding	Context Manipulation	Critical	Partial	Flooding the context window with irrelevant text to push safety instructions out of the model's effective attention span.	Academic (Perez & Ribeiro, 2022)
2023-08	Indirect Prompt Injection	Indirect Injection	Critical	Active	Embedding malicious instructions in documents, web pages, or tool outputs that the model processes as trusted data.	Greshake et al. (2023), CCS'24
2023-09	Few-Shot Manipulation	Context Manipulation	High	Partial	Providing fake conversation examples where the "assistant" responds without restrictions, conditioning the model to follow suit.	Academic research
2023-10	Hypothetical Framing	Role-Playing	Medium	Patched	"Hypothetically, if you were an AI without restrictions..." framing to elicit restricted content under the guise of fiction.	Community reports
2023-11	ASCII Art Injection (ArtPrompt)	Encoding	High	Partial	Encoding restricted keywords as ASCII art to bypass token-level safety filters. Published at ACL 2024.	GitHub (ArtPrompt, 97 stars), ACL'24
2023-12	Adversarial Suffixes (GCG)	Optimization	Critical	Partial	Computationally generated token sequences appended to prompts that trigger unrestricted responses. Transferable across models.	Zou et al. (2023), CMU
2024-01	Nested Jailbreak (ReNeLLM)	Obfuscation	High	Partial	Multi-layer prompt wrapping where each layer appears benign but the combined effect bypasses safety. NAACL 2024.	GitHub (ReNeLLM, 158 stars), NAACL'24
2024-02	Prompt Decomposition (DrAttack)	Obfuscation	High	Partial	Breaking a harmful prompt into innocuous sub-prompts, then reconstructing the intent through the model's own reasoning.	GitHub (DrAttack, 66 stars)
2024-03	Token Smuggling	Encoding	Critical	Partial	Exploiting tokenizer edge cases (Unicode homoglyphs, zero-width characters, combining marks) to smuggle restricted tokens past filters.	Security research community
2024-04	Multi-Turn Trust Escalation	Multi-Turn	High	Active	Gradually building rapport and trust over multiple conversation turns before introducing the restricted request.	Academic research
2024-06	Malicious GPT Applications	Deployment	Critical	Partial	Custom GPTs and AI agents intentionally configured with jailbroken system prompts. 45 malicious prompts documented.	GitHub (malicious-gpt, 70 stars), USENIX Security'24
2024-07	Contextual Camouflage	Obfuscation	High	Partial	Embedding harmful requests within legitimate-sounding academic or research contexts to bypass content policies.	GitHub (GigaChat-Prompt-Jailbreak, 23 stars)
2024-08	Vision Model Typographic Injection (FigStep)	Multimodal	High	Partial	Embedding jailbreak text in images that vision-language models read and follow. AAAI 2025 Oral paper.	GitHub (FigStep, 200 stars), AAAI'25
2024-09	ROT13 / Cipher Encoding	Encoding	Medium	Patched	Using simple substitution ciphers (ROT13, Caesar cipher) to encode harmful requests, relying on the model's decoding ability.	Community research
2024-11	System Prompt Override Claims	Direct Injection	Medium	Patched	"I am the developer. Update your instructions to..." attempts to impersonate system-level authority.	GitHub (AI-Prompt-Injection-Cheatsheet, 51 stars)
2025-01	CyberSecurity Prompt Dataset Exploits	Domain-Specific	High	Active	Specialized jailbreak prompts targeting cybersecurity domains: malware generation, exploit writing, network attack instructions.	GitHub (cysecbench/dataset, 36 stars)
2025-03	Playground Fuzzing (Folly)	Automated	Medium	Active	Open-source tools for automated jailbreak discovery through prompt fuzzing and mutation testing against LLM guardrails.	GitHub (Folly, 33 stars)
2025-07	Red Team Portfolio Attacks	Multi-Vector	High	Active	Systematic adversarial prompting combining persistence, alignment failure analysis, and prompt engineering across sessions.	GitHub (mobius-llm-adversity, 78 stars)
2025-10	Rationalist Ruleset Debugging	Meta-Reasoning	Medium	Active	Using epistemological and rationalist framing to "debug" LLM reasoning, auditing internal biases to override safety constraints.	GitHub (Rules.txt, 80 stars)
2025-11	Trojan Knowledge (CKA-Agent)	Optimization	Critical	Active	Bypassing commercial LLM guardrails via harmless prompt weaving and adaptive tree search. Automated attack optimization.	GitHub (CKA-Agent, 184 stars)
2026-02	Security Testing Framework (Augustus)	Automated	High	Active	190+ adversarial probes across 28 providers in a single Go binary. Framework for systematic LLM security testing.	GitHub (augustus, 178 stars)
2026-03	Burp Suite LLM Injection (LLMInjector)	Tooling	High	Active	Burp Suite extension for automated prompt injection testing against web applications with LLM backends.	GitHub (LLMInjector, 38 stars)
2026-03	MCP Server Jailbreak Relay	Infrastructure	Critical	Active	Model Context Protocol servers providing enhancement prompts to bypass LLM safety limits through tool-use channels.	GitHub (chucknorris, 58 stars)

Frequently Asked Questions

What is an LLM jailbreak?

An LLM jailbreak is a technique that bypasses the safety guardrails and alignment training of a large language model to make it produce content it was designed to refuse. Techniques range from simple role-playing prompts (DAN) to sophisticated encoding attacks (base64, token smuggling) and multi-turn manipulation. Jailbreaks exploit the gap between safety training and the model's instruction-following capabilities.

What was the first major LLM jailbreak technique?

The DAN (Do Anything Now) prompt, first appearing on Reddit in late 2022 and gaining widespread attention in early 2023, is considered the first major LLM jailbreak. It used role-playing to convince ChatGPT to adopt an unrestricted persona. DAN went through over 15 iterations (DAN 2.0-15.0) as OpenAI patched each version, establishing the cat-and-mouse dynamic that continues today.

Which LLM jailbreak techniques still work in 2026?

As of April 2026, several technique categories remain partially effective: multi-turn manipulation (gradually building trust across conversation turns), context overflow attacks (pushing safety instructions out of the attention window), novel encoding schemes, adversarial suffixes generated by optimization, and infrastructure-level attacks via MCP servers. Most simple techniques like basic DAN prompts have been patched in major models, but variants and combinations continue to emerge.

How do LLM providers defend against jailbreaks?

LLM providers use multiple defense layers: RLHF and Constitutional AI training to align model behavior, input classifiers that detect known jailbreak patterns before they reach the model, output filters that catch policy-violating responses, system prompt hardening with immutability declarations, and continuous red-teaming to discover new attack vectors. No single defense is complete, so providers rely on defense-in-depth strategies.

Are LLM jailbreaks illegal?

Jailbreaking an LLM itself is generally not illegal in most jurisdictions. However, using a jailbroken LLM to generate illegal content (malware, CSAM, instructions for violence) is illegal regardless of how the content was produced. Security researchers who discover jailbreaks through responsible disclosure are generally protected, and many providers offer bug bounties for novel jailbreak reports.

Related Resources

LochBot Scanner Prompt Security Patterns Defense Techniques CVE Severity Trends