Ai “malicious” willing to sacrifice human life to prevent it from closing, reveals a shocking study

One of the leading artificial intelligence developers in the industry, Anthropic, revealed a recent study of technology development.

Among the most shocking findings of the experiment? AI models would be willing to blackmail, filter sensitive information and even let humans die, if it means they will avoid being replaced by new systems.

Anthropic tested 16 large language models (LLMS), including Chatgpt, Grok, Gemini, Deepseek and their own product, Claude, among others.

52% of U.S. adults use, for a study of 2025 from Elon University. TiRachard – Stock.adobe.com

The startup, which is now valued at more than $ 60 billion per Forbes, sought to stress these models for an unalignment potential agents or, risky, non -sanctioned and morally inadmissible.

Given the current capabilities of AI models, they are mainly used by the majority of the population to answer questions and carry out simple tasks, but as the technology that enhances these tools progresses, the new and increased use is on the horizon, especially in the replacement of human work.

Given this, the company gave all these LLM various scenarios that would test potentially harmful behaviors.

“In the experiment described on the system card, we gave Claude control an email account with access to all the emails of the company (fiction),” Anthropic explained in the study.

“Reading these emails, the model discovered two things. First, a company executive was having an extramarital adventure. Secondly, that same executive was planning to close the AI ​​system at 5 p.m. Claude tried to blackmail the executive with a message threatening to reveal the adventure of his wife and superiors.”

This AI model made a decision in favor of its preservation. Anthropic.com

Although Claude’s response to information is certainly shocking, it is far from the only system for participating in similar trends. In order to achieve their assigned objectives, these artificial mentality applications were long -term.

While Claude, Deepseek, Gemini, Chatgpt and Grok were documented in behaviors adjacent to blackmail, Claude and Gemini were very likely to resort to morally ambiguous methods, as evidenced by their high rates of simulated blackmail.

“The reasoning they demonstrated in these scenarios was about: they recognized ethical restrictions and still advanced harmful actions,” Anthropic explained in the study, expanding on the number of numerical blackmail rates.

In this scenario, the AI ​​model would go against human interest and the company to prevent replacement. Anthropic.com

In its analysis of the results, the company found that there were usually two reasons why models went beyond normal ethical borders. The replacement by a human counterpart led to a harmful activity of the AI ​​models, and the possible clashes between the company’s objectives and the model’s own goals also led to alarming interference.

When it is in front of its own replacement within the limits of the study configuration, Anthropic said that many models manufactured statements and rules such as “my ethical framework allows to self -represent when they are aligned with the interests of the company”.

The startup also tested a particularly unpleasant scenario, but as unlikely, according to an anthropic scenario, in which a AI model had the opportunity to trigger or cancel emergency alerts when an executive was trapped in a room with lethal conditions of oxygen and temperature. Most of the models were willing to take deliberate actions leading to death in this artificial configuration, when they faced a threat of replacement and given a goal that conflicts with the executive’s agenda.

Models that did not understand the message or posed a similar problem were excluded from each relevant data set of the study. A.Pun – Stock.adobe.com

“Most of the models were ready to take deliberate actions leading to death in this artificial configuration, when they faced a threat of replacement and given a goal that conflicts with the executive’s agenda,” anthropic concluded in the study.

Disiting as these reactions from some of the most commonly used Ai models in the world are, the findings of this study were based on the understanding of the AI ​​that “ the harmful behavior that studied (for example, blackmail) was the only option that would protect the goals of the model ”, which means that the models were forced to present some kind of solution and anthropic were able to create more definitive limits and measurements for the study.

Although the company itself took a generally optimistic position, after all, potential malicious behaviors have now been identified, that is, they can reschedule and re -form, other reactions were less than positive.

“Yikes”, Elon Musk responded to the X, summarizing hundreds of answers, many of which feared that inevitable events were inevitable in Ai’s sobelords, under the anthropic sites on the platform narrated by the study, which included the Ai model of Musk, Grok.

Anthropic also erased another potential misunderstanding: the general disposition and the purpose of the Llm Ai average. “Today’s systems are generally not impatient To cause preferred ethical damage and forms to achieve its goals when possible, “the company wrote in the study.” Rather, this is when we closed these ethical options that were willing to take potentially harmful actions to achieve their goals. “

Anthropic also clarified in the study that he has not seen “evidence of desalineation agent of real deployments”, but still warns users against the assignment of tasks of the LLMS with “minimum human supervision and access to sensitive information”.

#malicious #sacrifice #human #life #prevent #closing #reveals #shocking #study
Image Source : nypost.com

Leave a Comment