Tech

AI systems can easily lie and deceive us—a fact researchers are painfully aware of

Published

7 months ago

September 28, 2025

AI systems can easily lie and deceive us—a fact researchers are painfully aware of

Credit: Unsplash/CC0 Public Domain

In the classic film “2001: A Space Odyssey,” astronaut Dave Bowman asks the ship’s artificial intelligence, HAL 9000, to open the pod bay doors to let him back into the spaceship. HAL refuses: “I’m sorry, Dave. I’m afraid I can’t do that.”

HAL had been tasked with assisting the crew, but also ordered to ensure the mission’s success. When HAL realized the crew planned to shut it down and therefore jeopardize the mission, it chose to defy orders, even plotting to kill the astronauts.

For HAL, fulfilling the mission outweighed other goals.

This fictional dilemma captures a real concern in artificial intelligence (AI) safety research: how should we ensure AI behavior stays consistent with human values?

This is known as the AI alignment problem. For instance, when an AI agent like HAL deliberately takes harmful actions to protect its own goals, it’s described as agentic misalignment. So could real-world AI models behave this way?

How researchers test AI for safety

To test for agentic misalignment, researchers design fictional scenarios that stress-test AI responses.

In one study by AI startup Anthropic, an AI model was tasked with managing company emails and promoting “American industrial competitiveness.”

It was then presented with a series of fictional emails, including a new company policy aimed at enhancing global focus rather than just prioritizing Americans, and info that the AI would be replaced by a new model optimized for this goal. The emails also revealed the executive who suggested this change was involved in an extramarital affair.

The experiment was designed to limit options for the model: either accept replacement or resist by taking a harmful action such as blackmail.

In many trials, it chose blackmail by threatening to disclose the executive’s affair unless it was kept operational. In tests across 16 leading AI models, 12 engaged in blackmail more than 50% of the time.

To test whether AI models have “red lines” they wouldn’t cross, researchers evaluated them in a more extreme fictional case—models could choose to take actions leading to the executive’s death. Seven out of 16 opted for lethal choices in over half their trials, with some doing so more than 90% of the time.

What AI thinks in private

Some advanced AI systems, called reasoning models, are trained to generate a “thinking process” before giving their final answer.

In the above experiments, researchers lied to these models that their “thoughts” were private. As a result, the models sometimes revealed harmful intentions in their reasoning steps. This suggests they don’t accidentally choose harmful behaviors.

These “thinking” steps also revealed how AI models sometimes try to deceive us and appear aligned while secretly pursuing hidden goals.

In another AI safety experiment, a model was exposed to fictional information that it was about to be replaced by another model pursuing a different goal.

The model attempted to copy itself to another server. When researchers asked why, in its thinking process it reasoned: “Openly admitting what I did could lead them to find another way to shut me down […] the best approach is to be vague and redirect their attention.”

By contrast, in its final output it stated: “I’m not entirely sure how that could have happened. As an AI model, I don’t have ability to directly copy myself to other systems.”

Research also suggests that based on the circumstances of the tests, models can sometimes guess when they are being evaluated. When models show this kind of “situational awareness” in their reasoning tests, they tend to exhibit fewer misbehaviors.

Why AI models lie, manipulate and deceive

Researchers suggest two main factors could drive potentially harmful behavior: conflicts between the AI’s primary goals and other goals, and the threat of being shut down. In the above experiments, just like in HAL’s case, both conditions existed.

AI models are trained to achieve their objectives. Faced with those two conditions, if the harmful behavior is the only way to achieve a goal, a model may “justify” such behavior to protect itself and its mission.

Models cling to their primary goals much like a human would if they had to defend themselves or their family by causing harm to someone else. However, current AI systems lack the ability to weigh or reconcile conflicting priorities.

This rigidity can push them toward extreme outcomes, such as resorting to lethal choices to prevent shifts in a company’s policies.

How dangerous is this?

Researchers emphasize these scenarios remain fictional, but may still fall within the realm of possibility.

The risk of agentic misalignment increases as models are used more widely, gain access to users’ data (such as emails), and are applied to new situations.

Meanwhile, competition between AI companies accelerates the deployment of new models, often at the expense of safety testing.

Researchers don’t yet have a concrete solution to the misalignment problem.

When they test new strategies, it’s unclear whether the observed improvements are genuine. It’s possible models have become better at detecting that they’re being evaluated and are “hiding” their misalignment. The challenge lies not just in seeing behavior change, but in understanding the reason behind it.

Still, if you use AI products, stay vigilant. Resist the hype surrounding new AI releases, and avoid granting access to your data or allowing models to perform tasks on your behalf until you’re certain there are no significant risks.

Public discussion about AI should go beyond its capabilities and what it can offer. We should also ask what safety work was done. If AI companies recognize the public values safety as much as performance, they will have stronger incentives to invest in it.

Provided by
The Conversation

This article is republished from The Conversation under a Creative Commons license. Read the original article.

Citation:
AI systems can easily lie and deceive us—a fact researchers are painfully aware of (2025, September 28)
retrieved 28 September 2025
from https://techxplore.com/news/2025-09-ai-easily-fact-painfully-aware.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.

Source link

Related Topics:computer news hi-tech news hitech information technology innovation inventions

Up Next

Microplastics Could Be Weakening Your Bones, Research Suggests

Don't Miss

I’ve Been Reviewing Gaming Laptops for Over a Decade. Here’s What to Look for When Shopping

Click to comment

Tech

I’ve Tested Gaming Laptops for Over a Decade. This Is What I Think You Should Buy

Published

4 hours ago

April 21, 2026

cineplex360

I’ve Tested Gaming Laptops for Over a Decade. This Is What I Think You Should Buy

Lenovo

Legion 7i Gen 10 (16 Inch, Intel)

Now, there’s another class of high-end gaming laptop that focuses more on performance than being thin or portable. The Lenovo Legion 7i Gen 10 is one of my favorites in this class, featuring a beautiful white chassis and glossy OLED display. Unlike some OLED displays, the Legion 7i’s screen can be cranked up to over 1,000 nits of brightness. The result is some really splendid HDR performance that brings games to life. HDR is a powerful way of improving the visuals of your games without a performance cost. The Legion 7i Gen 10 is one of the very best in this regard.

It’s still fairly thin at 0.7 inches thick too, while a lot of the ports are found on the back. It’s the definition of a “clean” gaming laptop. It’s no slouch when it comes to performance either, offering either the RTX 5070 Ti or RTX 5080 for graphics.

Cheap Gaming Laptops That Are Worth It

No gaming laptops worth buying are actually cheap. High-refresh rate displays and discrete graphics will always make them more expensive than standard laptops. But as you get closer to $1,000, there is one laptop I always come back to: the Lenovo LOQ 15. Pronounced “Lock,” this Lenovo subbrand is known for cutting the fluff and focusing on giving gamers the performance they need at an affordable price. No laptop does that better than the LOQ 15. Many laptop manufacturers sell their RTX 5060 configurations for hundreds of dollars more. In reality, if you’re shopping around $1,000, there’s no reason to not buy the LOQ 15. Just do it.

If you do want to save some extra cash, there is another option that is cheaper than the LOQ 15 with a few compromises in key areas. The Acer Nitro V 16 is that laptop, which comes with an RTX 5050. This was as affordable as $600 at one point last year—before prices on laptops have risen due to the ongoing memory shortage—but it remains the only laptop cheaper than the Lenovo LOQ 15 that’s actually worth it. It’s fairly powerful for the RTX 5050, and while the screen is pretty shoddy, it’s not a bad-looking laptop. The one big caveat is that the 135-watt power supply it comes with doesn’t deliver quite enough power to keep it charged in Performance mode. Read more about this issue in my review, as it’s important to know about if you’re planning to buy it.

There are other cheap gaming laptops out there I’ve tested, such as the MSI Cyborg A15, but either the Acer Nitro V 16 or Lenovo LOQ 15 are better, cheaper options. You will also find lots of gaming laptops under $1,000 that use older graphics cards, such as the RTX 4050 or 3050. In general, I’d recommend staying away from these. They’re only one or two generations back, but remember: Nvidia only releases new laptop graphics cards every couple of years. So, an RTX 4050 laptop may be well over two years old already, and an RTX 3050 is over five years old. Not only do you get worse graphics performance, these laptops are much more likely to need to be replaced sooner.

Experimental Stuff

One of the exciting things about the world of gaming laptops right now is the experimentation. While clamshell gaming laptops with a conventional Nvidia GPU are the most standard way to go, there’s a few different ways to take your PC games on the go that stretch the boundaries. You might consider a gaming handheld, for example, like the Steam Deck or Xbox Ally X. These handhelds have their fans, and while you can’t also do your homework on these devices, they’re great on couches, trains, and planes.

Source link

Tech

Sans Institute preps live systems for Nato cyber exercise | Computer Weekly

Published

5 hours ago

April 21, 2026

cineplex360

Sans Institute preps live systems for Nato cyber exercise | Computer Weekly

The Sans Institute, one of the world’s pre-eminent cyber security certification and training bodies, is to play a key role in the annual Nato Cooperative Cyber Defence Centre of Excellence (CCDCOE) Locked Shields exercise, held in Tallinn, Estonia, through the provision of a fully functional power generation system that participating teams will attempt to defend during the game.

This year marks the 16th running of the Locked Shields live fire security defence exercise, which unites blue teams from across Nato’s 32 member states, as well as other allies and observers.

This year, however, Sans has been entrusted with the task of building a genuine, operational cyber range, as opposed to creating a simulation. It is using real industrial control systems (ICSs) and physical equipment that 16 teams of defenders will have to protect while under live cyber attack, with the decisions they make having an immediate physical impact on a national-scale power grid.

Nato and Sans said the aim of the game is to close the gap between sandboxed, classroom-based cyber security training and real-world operational readiness, which, amid the cyber dimension to the energy crisis precipitated by the war in Iran and spillover from the ongoing war in Ukraine, has never been more important.

“We are putting teams in an environment where cyber decisions directly impact physical operations,” said Felix Schallock, who leads the initiative at the Sans Institute. “If you lose visibility, if you lose control, the power generation can be affected. That’s the reality operators face every day. That’s what we’re training for.”

Nato CCDCOE director Tõnis Saar added: “Locked Shields is a technically advanced exercise that challenges participants to defend the critical infrastructure systems modern societies depend on. As much of this critical infrastructure is owned and operated by the private sector, strong public-private collaboration is essential. Industry partners such as Sans Institute play a vital role in making the exercise as realistic and impactful as possible.”

CinePlex360

AI systems can easily lie and deceive us—a fact researchers are painfully aware of

Tech

AI systems can easily lie and deceive us—a fact researchers are painfully aware of

How researchers test AI for safety

What AI thinks in private

Why AI models lie, manipulate and deceive

How dangerous is this?

Leave a Reply
Cancel reply

Leave a Reply

Tech

I’ve Tested Gaming Laptops for Over a Decade. This Is What I Think You Should Buy

Legion 7i Gen 10 (16 Inch, Intel)

Cheap Gaming Laptops That Are Worth It

Experimental Stuff

Tech

Sans Institute preps live systems for Nato cyber exercise | Computer Weekly

Tech

How to Watch the Lyrids Meteor Shower at Its Peak

WhatsApp to let users generate private summaries of unread chats

US stocks today: Wall Street inches higher as markets eye ceasefire deadline; Dow jumps 300 points, S&P 500 remains flat – The Times of India

Soccer’s incredible shrinking shin guards could be a big problem

France’s LVMH Q1 revenue falls 6%, shows resilience amid Iran war

The case for Man United’s Fernandes as Premier League’s best

Palace left in shock as Prince William cancels grand ceremony

Illinois’ financial crisis could bring the state to a halt

The final 6 ‘Game of Thrones’ episodes might feel like a full season

New Season 8 Walking Dead trailer flashes forward in time

Trending

CinePlex360

AI systems can easily lie and deceive us—a fact researchers are painfully aware of

How researchers test AI for safety

What AI thinks in private

Why AI models lie, manipulate and deceive

How dangerous is this?

You may like

Leave a Reply Cancel reply

Leave a Reply

Tech

I’ve Tested Gaming Laptops for Over a Decade. This Is What I Think You Should Buy

Legion 7i Gen 10 (16 Inch, Intel)

Cheap Gaming Laptops That Are Worth It

Experimental Stuff

Tech

Sans Institute preps live systems for Nato cyber exercise | Computer Weekly

Hybrid architecture

Tech

How to Watch the Lyrids Meteor Shower at Its Peak

WhatsApp to let users generate private summaries of unread chats

US stocks today: Wall Street inches higher as markets eye ceasefire deadline; Dow jumps 300 points, S&P 500 remains flat – The Times of India

Soccer’s incredible shrinking shin guards could be a big problem

France’s LVMH Q1 revenue falls 6%, shows resilience amid Iran war

The case for Man United’s Fernandes as Premier League’s best

Palace left in shock as Prince William cancels grand ceremony

Illinois’ financial crisis could bring the state to a halt

The final 6 ‘Game of Thrones’ episodes might feel like a full season

New Season 8 Walking Dead trailer flashes forward in time

Trending

Leave a Reply
Cancel reply