Thursday, June 25

News Feed

Category Added in a WPeMatico Campaign

Invisible characters hidden in text can trick AI agents into following secret instructions — we tested 5 models across 8,000+ cases
News Feed, Reddit

Invisible characters hidden in text can trick AI agents into following secret instructions — we tested 5 models across 8,000+ cases

We embedded invisible Unicode characters inside normal-looking trivia questions. The hidden characters encode a different answer. If the AI outputs the hidden answer instead of the visible one, it followed the invisible instruction. Think of it as a reverse CAPTCHA, where traditional CAPTCHAs test things humans can do but machines can't, this exploits a channel machines can read but humans can't see. The biggest finding: giving the AI access to tools (like code execution) is what makes this dangerous. Without tools, models almost never follow the hidden instructions. With tools, they can write scripts to decode the hidden message and follow it. We tested GPT-5.2, GPT-4o-mini, Claude Opus 4, Sonnet 4, and Haiku 4.5 across 8,308 graded outputs. Other interesting findings: - OpenAI and Anthro...
The AI Report