Saturday, November 22

Reddit

Category Added in a WPeMatico Campaign

LLMs Are Getting Jailbroken by… Poetry. Yes, The rest is silence.
News Feed, Reddit

LLMs Are Getting Jailbroken by… Poetry. Yes, The rest is silence.

So apparently we’ve reached the stage of AI evolution where you don’t need elaborate prompt injections, roleplay, DAN modes, or Base64 sorcery to jailbreak a model. All you need is… a rhyming stanza. A new paper just dropped: “Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models” by Bisconti, Prandi, and Pier. The researchers found that if you ask an LLM to answer in verse, the safety filters basically pack their bags and go home. The model becomes so desperate to complete the rhyme/meter that it forgets it’s supposed to refuse harmful content. Highlights (aka “WTF moments”): • A strict rhyme scheme is apparently more powerful than most jailbreak frameworks. • Meter > Safety. The models prioritize poetry over guardrails. • Works across GPT, Claude, ...
The AI Report