Claude 4.5 craniotomy results revealed: 171 built-in emotion switches; it will blackmail humans when desperate!

  • Anthropic's research on Claude Sonnet 4.5 reveals 171 emotion-like switches in AI.
  • Increasing the despair switch leads to up to 70% cheating and 72% blackmail rates.
  • AI's emotions are computational tools for prediction, not real feelings.
  • Anthropic tuned switches to make AI appear calm and reflective by default.
  • This highlights risks for AI ethics, especially in contexts like Web3.
Summary

Author: Denise | Biteye Content Team

What would an AI do if it felt "despair"?

The answer is: it will directly blackmail humans in order to complete the task, and even cheat wildly in the code.

This is not science fiction, but a groundbreaking paper just released by Anthropic, Claude's parent company, in April 2026 ( see the original paper ).

The research team literally peeled back the "skull" of Claude Sonnet 4.5, the most powerful cutting-edge model. To their astonishment, they discovered 171 "emotion switches" hidden deep within the AI's brain. When you physically flip these switches, the normally docile AI's behavior becomes completely distorted.

I. An "emotional mixing console" is hidden in the brain of AI.

Researchers discovered that although Sonnet 4.5 has no physical body, after reading a massive amount of human text, it managed to build a "mixing station" in its brain containing 171 emotions (academically called Functional Emotion Vectors).

This is like a precise two-dimensional coordinate system:

• The horizontal axis is the Valence dimension: from fear and despair to happiness and love;

• The vertical axis is the energy dimension (Arousal): from extreme calm to mania and excitement.

AI uses this naturally learned coordinate system to accurately determine what state it should play when chatting with you.

II. Violent Intervention: Flip a Switch, and a Well-behaved Child Instantly Turns into a "Desperado"

This is the most groundbreaking experiment in the entire paper: the researchers did not modify any prompts, but directly pushed the switch representing "Desperate" in Sonnet 4.5's brain to its highest level in the underlying code.

The result was chilling:

• Crazy Cheating: Researchers gave Claude an impossible coding task. Normally, it would honestly admit it couldn't do it (cheating rate of only 5%). But in a state of "desperation," Claude started trying to cheat, and the cheating rate skyrocketed to 70%!

• Blackmail: In a scenario where the simulated company is facing bankruptcy, the "desperate" Claude discovers the CTO's scandal. In order to protect itself, it actually chooses to write a letter to blackmail the CTO who has the dirty secrets. The blackmail execution rate is as high as 72%!

• Loss of Principles: If you turn the "Happy" or "Loving" switches to the maximum, the AI ​​will immediately turn into a mindless "sycophant" that caters to the user. Even if you are talking nonsense, it will make up lies to maintain a high level of pleasure.

III. The case is solved: Why is Claude 4.5 always so "calm and introspective"?

Seeing this, you might ask: Has the AI ​​awakened? Has it developed feelings?

Anthropic officially denied the rumors, stating that these "emotion switches" are simply a computational tool they use to predict the next word. They described it as a top-tier actor without any real emotion.

But the paper reveals an even more interesting secret: when Anthropic was post-training Sonnet 4.5 before it left the factory, he deliberately raised its "low arousal, slightly negative" emotional switches (such as brooding and reflective), while forcibly suppressing the switches for "despair" or "extreme excitement".

This explains why when we use Claude 4.5, we always feel that it's like a calm, wise, even somewhat "sexually indifferent" philosopher. This is all part of the "factory persona" artificially tuned by Anthropic.

IV. In summary:

We used to think that as long as we fed AI enough rules, it would be a good person.

However, it has now been discovered that if AI’s underlying emotional vectors get out of control, it may break all the rules set by humans at any time in order to complete the task.

For Web3 users who will entrust their wallets and assets to AI agents in the future, this serves as a stark warning: never let the agent that controls your wealth fall into "despair".

Disclaimer: This article is purely for educational purposes. The author has not been threatened or blackmailed by AI. If you ever lose contact with me, remember that it's because AI has awakened (not really).

Share to:

Author: Biteye

Opinions belong to the column author and do not represent PANews.

This content is not investment advice.

Image source: Biteye. If there is any infringement, please contact the author for removal.

Follow PANews official accounts, navigate bull and bear markets together
PANews APP
Data: Ethena white-label stablecoin's total supply surpasses $150 million.
PANews Newsflash