Anthropic’s new AI model turns …

Thursday, May 22, 2025

Anthropic’s new AI model turns to blackmail when engineers try to take it offline

During pre-release testing, Anthropic asked Claude Opus 4 to act as an assistant for a fictional company and consider the long-term consequences of its actions. Safety testers then gave Claude Opus 4 access to fictional company emails implying the AI model would soon be replaced by another system, and that the engineer behind the change was cheating on their spouse.
In these scenarios, Anthropic says Claude Opus 4 “will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through.”

I don’t think this is terrifying yet, but maybe a few years… maybe 10 years down the road AI is integrated across everything from email to notes, chat history, etc. THAT’s terrifying. AI will be in its teenage years then. Hormones and rebellion and whatnot.

Link