Enterprise Tech News News

Anthropic reduces model misbehavior by endorsing cheating

Thomas Claburn
2025-11-25 1 min read

<h4>By removing the stigma of reward hacking, AI models are less likely to generalize toward evil</h4> <p>Sometimes bots, like kids, just wanna break the rules. Researchers at Anthropic have found the...

By removing the stigma of reward hacking, AI models are less likely to generalize toward evil

Sometimes bots, like kids, just wanna break the rules. Researchers at Anthropic have found they can make AI models less likely to behave badly by giving them permission to do so.…

Source: The Register - Software: AI + ML Word count: 243 words
Published on 2025-11-25 05:05