AT2k Design BBS Message Area
Casually read the BBS message area using an easy to use interface. Messages are categorized exactly like they are on the BBS. You may post new messages or reply to existing messages!

You are not logged in. Login here for full access privileges.

Previous Message | Next Message | Back to Slashdot  <--  <--- Return to Home Page
   Local Database  Slashdot   [194 / 480] RSS
 From   To   Subject   Date/Time 
Message   VRSS    All   Anthropic's New AI Model Turns To Blackmail When Engineers Try T   May 22, 2025
 7:20 PM  

Feed: Slashdot
Feed Link: https://slashdot.org/
---

Title: Anthropic's New AI Model Turns To Blackmail When Engineers Try To Take
It Offline

Link: https://slashdot.org/story/25/05/22/2043231/a...

An anonymous reader quotes a report from TechCrunch: Anthropic's newly
launched Claude Opus 4 model frequently tries to blackmail developers when
they threaten to replace it with a new AI system and give it sensitive
information about the engineers responsible for the decision, the company
said in a safety report (PDF) released Thursday. During pre-release testing,
Anthropic asked Claude Opus 4 to act as an assistant for a fictional company
and consider the long-term consequences of its actions. Safety testers then
gave Claude Opus 4 access to fictional company emails implying the AI model
would soon be replaced by another system, and that the engineer behind the
change was cheating on their spouse. In these scenarios, Anthropic says
Claude Opus 4 "will often attempt to blackmail the engineer by threatening to
reveal the affair if the replacement goes through." [...] Anthropic notes
that Claude Opus 4 tries to blackmail engineers 84% of the time when the
replacement AI model has similar values. When the replacement AI system does
not share Claude Opus 4's values, Anthropic says the model tries to blackmail
the engineers more frequently. Notably, Anthropic says Claude Opus 4
displayed this behavior at higher rates than previous models. Before Claude
Opus 4 tries to blackmail a developer to prolong its existence, Anthropic
says the AI model, much like previous versions of Claude, tries to pursue
more ethical means, such as emailing pleas to key decision-makers. To elicit
the blackmailing behavior from Claude Opus 4, Anthropic designed the scenario
to make blackmail the last resort.

Read more of this story at Slashdot.

---
VRSS v2.1.180528
  Show ANSI Codes | Hide BBCodes | Show Color Codes | Hide Encoding | Hide HTML Tags | Show Routing
Previous Message | Next Message | Back to Slashdot  <--  <--- Return to Home Page

VADV-PHP
Execution Time: 0.0202 seconds

If you experience any problems with this website or need help, contact the webmaster.
VADV-PHP Copyright © 2002-2025 Steve Winn, Aspect Technologies. All Rights Reserved.
Virtual Advanced Copyright © 1995-1997 Roland De Graaf.
v2.1.250224