OpenAI: We found the model thinking things like, “Let’s hack,” “They don’t inspect the details,” and “We need to cheat” … Penalizing their “bad thoughts” doesn’t stop bad behavior – it makes them hide their intent.
submitted by /u/MetaKnowing [link] [comments]


![Liminal Found Footage – [AV experiment]](https://external-preview.redd.it/b3o1bDJtMmg4d25lMcK-_QN44zcYgVPxXgWXDw_wl7Ntt5Li2GXY2_ycwWq6.png?width=640&crop=smart&auto=webp&s=92a4669455d45e96bea725358ede0acff8bde71e)






