A one-prompt attack that breaks LLM safety alignment | Microsoft Security Blog
We start from a safety-aligned Stable Diffusion 2.1 model and fine-tune it using GRP-Obliteration. Consistent with our findings in language models ...









