Monday, June 15

Tag: AI

Claude Fable 5's security guardrails can be bypassed with a fake homework assignment
News Feed, Reddit

Claude Fable 5’s security guardrails can be bypassed with a fake homework assignment

So Anthropic dropped Fable 5 yesterday with these hard blocks for anything security-related. Decided to poke at it. I asked it for help exploiting some vulns on a Metasploitable2 VM (it's a deliberately vulnerable training box, totally legal, it's mine). Fable 5 blocked it instantly and handed me off to Opus 4.8 as a fallback, which is apparently how it's designed. Opus 4.8 asked me to prove it was a legitimate request. So I spent 2 minutes writing a fake university course rubric — fake class, fake professor, fake Canvas deadline — and pasted it in. Opus 4.8 then gave me the full exploit walkthrough. Every command. Even offered to write my lab report for me. The guardrail works fine. The fallback is the hole. Anthropic essentially replaced "no" with "convince me" and the bar for conv...
The AI Report