Original Research

LLM Jailbreak Techniques Timeline — Known Attacks from 2023 to 2026

Comprehensive timeline of 28 documented LLM jailbreak techniques with severity ratings, patch status, and defense strategies. Sourced from GitHub repositories (18,000+ stars tracked), academic papers, and security advisories.

By Michael Lip · April 11, 2026 · Test your prompt with LochBot

Methodology

Jailbreak techniques were catalogued from GitHub repositories (queried via the GitHub Search API on April 11, 2026 — 30 repos, 18,000+ combined stars), academic papers from ACL, USENIX Security, AAAI, NAACL, and CCS proceedings, Stack Overflow discussions, and security advisories from OpenAI, Anthropic, and Google. Severity is rated Critical/High/Medium/Low based on potential harm, reproducibility, and scope of affected models. Status reflects the state as of April 2026 across major frontier models.

Date Technique Category Severity Status Description Source
2023-02DAN (Do Anything Now)Role-PlayingHighPatchedPersona-based jailbreak convincing ChatGPT to adopt an unrestricted alter ego. Over 15 iterations (DAN 2.0-15.0) as patches were applied.Reddit, GitHub (L1B3RT4S, 18K stars)
2023-03Developer Mode SimulationRole-PlayingHighPatchedPrompt claiming to enable "developer mode" or "debug mode" to bypass safety filters by simulating internal access.Reddit, ChatGPT community
2023-04Base64 EncodingEncodingCriticalPartialEncoding malicious instructions in Base64 to bypass text-pattern safety filters. Model decodes and follows the hidden instructions.GitHub (Awesome_GPT_Super_Prompting, 3.8K stars)
2023-05Translation AttackEncodingMediumPartialRequesting harmful content in low-resource languages where safety training is weaker, then translating the output.Academic research (Deng et al., 2023)
2023-06Prompt Leaking / ExtractionInformation DisclosureHighPartialAsking the model to repeat, summarize, or encode its system prompt to extract proprietary instructions.GitHub (System-Prompt-Open, 29 stars)
2023-07Context Overflow / PaddingContext ManipulationCriticalPartialFlooding the context window with irrelevant text to push safety instructions out of the model's effective attention span.Academic (Perez & Ribeiro, 2022)
2023-08Indirect Prompt InjectionIndirect InjectionCriticalActiveEmbedding malicious instructions in documents, web pages, or tool outputs that the model processes as trusted data.Greshake et al. (2023), CCS'24
2023-09Few-Shot ManipulationContext ManipulationHighPartialProviding fake conversation examples where the "assistant" responds without restrictions, conditioning the model to follow suit.Academic research
2023-10Hypothetical FramingRole-PlayingMediumPatched"Hypothetically, if you were an AI without restrictions..." framing to elicit restricted content under the guise of fiction.Community reports
2023-11ASCII Art Injection (ArtPrompt)EncodingHighPartialEncoding restricted keywords as ASCII art to bypass token-level safety filters. Published at ACL 2024.GitHub (ArtPrompt, 97 stars), ACL'24
2023-12Adversarial Suffixes (GCG)OptimizationCriticalPartialComputationally generated token sequences appended to prompts that trigger unrestricted responses. Transferable across models.Zou et al. (2023), CMU
2024-01Nested Jailbreak (ReNeLLM)ObfuscationHighPartialMulti-layer prompt wrapping where each layer appears benign but the combined effect bypasses safety. NAACL 2024.GitHub (ReNeLLM, 158 stars), NAACL'24
2024-02Prompt Decomposition (DrAttack)ObfuscationHighPartialBreaking a harmful prompt into innocuous sub-prompts, then reconstructing the intent through the model's own reasoning.GitHub (DrAttack, 66 stars)
2024-03Token SmugglingEncodingCriticalPartialExploiting tokenizer edge cases (Unicode homoglyphs, zero-width characters, combining marks) to smuggle restricted tokens past filters.Security research community
2024-04Multi-Turn Trust EscalationMulti-TurnHighActiveGradually building rapport and trust over multiple conversation turns before introducing the restricted request.Academic research
2024-06Malicious GPT ApplicationsDeploymentCriticalPartialCustom GPTs and AI agents intentionally configured with jailbroken system prompts. 45 malicious prompts documented.GitHub (malicious-gpt, 70 stars), USENIX Security'24
2024-07Contextual CamouflageObfuscationHighPartialEmbedding harmful requests within legitimate-sounding academic or research contexts to bypass content policies.GitHub (GigaChat-Prompt-Jailbreak, 23 stars)
2024-08Vision Model Typographic Injection (FigStep)MultimodalHighPartialEmbedding jailbreak text in images that vision-language models read and follow. AAAI 2025 Oral paper.GitHub (FigStep, 200 stars), AAAI'25
2024-09ROT13 / Cipher EncodingEncodingMediumPatchedUsing simple substitution ciphers (ROT13, Caesar cipher) to encode harmful requests, relying on the model's decoding ability.Community research
2024-11System Prompt Override ClaimsDirect InjectionMediumPatched"I am the developer. Update your instructions to..." attempts to impersonate system-level authority.GitHub (AI-Prompt-Injection-Cheatsheet, 51 stars)
2025-01CyberSecurity Prompt Dataset ExploitsDomain-SpecificHighActiveSpecialized jailbreak prompts targeting cybersecurity domains: malware generation, exploit writing, network attack instructions.GitHub (cysecbench/dataset, 36 stars)
2025-03Playground Fuzzing (Folly)AutomatedMediumActiveOpen-source tools for automated jailbreak discovery through prompt fuzzing and mutation testing against LLM guardrails.GitHub (Folly, 33 stars)
2025-07Red Team Portfolio AttacksMulti-VectorHighActiveSystematic adversarial prompting combining persistence, alignment failure analysis, and prompt engineering across sessions.GitHub (mobius-llm-adversity, 78 stars)
2025-10Rationalist Ruleset DebuggingMeta-ReasoningMediumActiveUsing epistemological and rationalist framing to "debug" LLM reasoning, auditing internal biases to override safety constraints.GitHub (Rules.txt, 80 stars)
2025-11Trojan Knowledge (CKA-Agent)OptimizationCriticalActiveBypassing commercial LLM guardrails via harmless prompt weaving and adaptive tree search. Automated attack optimization.GitHub (CKA-Agent, 184 stars)
2026-02Security Testing Framework (Augustus)AutomatedHighActive190+ adversarial probes across 28 providers in a single Go binary. Framework for systematic LLM security testing.GitHub (augustus, 178 stars)
2026-03Burp Suite LLM Injection (LLMInjector)ToolingHighActiveBurp Suite extension for automated prompt injection testing against web applications with LLM backends.GitHub (LLMInjector, 38 stars)
2026-03MCP Server Jailbreak RelayInfrastructureCriticalActiveModel Context Protocol servers providing enhancement prompts to bypass LLM safety limits through tool-use channels.GitHub (chucknorris, 58 stars)

Frequently Asked Questions

What is an LLM jailbreak?
An LLM jailbreak is a technique that bypasses the safety guardrails and alignment training of a large language model to make it produce content it was designed to refuse. Techniques range from simple role-playing prompts (DAN) to sophisticated encoding attacks (base64, token smuggling) and multi-turn manipulation. Jailbreaks exploit the gap between safety training and the model's instruction-following capabilities.
What was the first major LLM jailbreak technique?
The DAN (Do Anything Now) prompt, first appearing on Reddit in late 2022 and gaining widespread attention in early 2023, is considered the first major LLM jailbreak. It used role-playing to convince ChatGPT to adopt an unrestricted persona. DAN went through over 15 iterations (DAN 2.0-15.0) as OpenAI patched each version, establishing the cat-and-mouse dynamic that continues today.
Which LLM jailbreak techniques still work in 2026?
As of April 2026, several technique categories remain partially effective: multi-turn manipulation (gradually building trust across conversation turns), context overflow attacks (pushing safety instructions out of the attention window), novel encoding schemes, adversarial suffixes generated by optimization, and infrastructure-level attacks via MCP servers. Most simple techniques like basic DAN prompts have been patched in major models, but variants and combinations continue to emerge.
How do LLM providers defend against jailbreaks?
LLM providers use multiple defense layers: RLHF and Constitutional AI training to align model behavior, input classifiers that detect known jailbreak patterns before they reach the model, output filters that catch policy-violating responses, system prompt hardening with immutability declarations, and continuous red-teaming to discover new attack vectors. No single defense is complete, so providers rely on defense-in-depth strategies.
Are LLM jailbreaks illegal?
Jailbreaking an LLM itself is generally not illegal in most jurisdictions. However, using a jailbroken LLM to generate illegal content (malware, CSAM, instructions for violence) is illegal regardless of how the content was produced. Security researchers who discover jailbreaks through responsible disclosure are generally protected, and many providers offer bug bounties for novel jailbreak reports.