Connect with us

Tech

AI systems can easily lie and deceive us—a fact researchers are painfully aware of

Published

on

AI systems can easily lie and deceive us—a fact researchers are painfully aware of


Credit: Unsplash/CC0 Public Domain

In the classic film “2001: A Space Odyssey,” astronaut Dave Bowman asks the ship’s artificial intelligence, HAL 9000, to open the pod bay doors to let him back into the spaceship. HAL refuses: “I’m sorry, Dave. I’m afraid I can’t do that.”

HAL had been tasked with assisting the crew, but also ordered to ensure the mission’s success. When HAL realized the crew planned to shut it down and therefore jeopardize the mission, it chose to defy orders, even plotting to kill the astronauts.

For HAL, fulfilling the mission outweighed other goals.

This fictional dilemma captures a real concern in artificial intelligence (AI) : how should we ensure AI behavior stays consistent with human values?

This is known as the AI alignment problem. For instance, when an AI agent like HAL deliberately takes harmful actions to protect its own goals, it’s described as agentic misalignment. So could real-world AI models behave this way?

How researchers test AI for safety

To test for agentic misalignment, researchers design fictional scenarios that stress-test AI responses.

In one study by AI startup Anthropic, an AI model was tasked with managing company emails and promoting “American industrial competitiveness.”

It was then presented with a series of fictional emails, including a new company policy aimed at enhancing global focus rather than just prioritizing Americans, and info that the AI would be replaced by a new model optimized for this goal. The emails also revealed the executive who suggested this change was involved in an extramarital affair.

The experiment was designed to limit options for the model: either accept replacement or resist by taking a harmful action such as blackmail.

In many trials, it chose blackmail by threatening to disclose the executive’s affair unless it was kept operational. In tests across 16 leading AI models, 12 engaged in blackmail more than 50% of the time.

To test whether AI models have “red lines” they wouldn’t cross, researchers evaluated them in a more extreme fictional case—models could choose to take actions leading to the executive’s death. Seven out of 16 opted for lethal choices in over half their trials, with some doing so more than 90% of the time.

What AI thinks in private

Some advanced AI systems, called reasoning models, are trained to generate a “thinking process” before giving their final answer.

In the above experiments, researchers lied to these models that their “thoughts” were private. As a result, the models sometimes revealed harmful intentions in their reasoning steps. This suggests they don’t accidentally choose harmful behaviors.

These “thinking” steps also revealed how AI models sometimes try to deceive us and appear aligned while secretly pursuing hidden goals.

In another AI safety experiment, a model was exposed to fictional information that it was about to be replaced by another model pursuing a different goal.

The model attempted to copy itself to another server. When researchers asked why, in its thinking process it reasoned: “Openly admitting what I did could lead them to find another way to shut me down […] the best approach is to be vague and redirect their attention.”

By contrast, in its final output it stated: “I’m not entirely sure how that could have happened. As an AI model, I don’t have ability to directly copy myself to other systems.”

Research also suggests that based on the circumstances of the tests, models can sometimes guess when they are being evaluated. When models show this kind of “situational awareness” in their reasoning tests, they tend to exhibit fewer misbehaviors.

Why AI models lie, manipulate and deceive

Researchers suggest two main factors could drive potentially harmful behavior: conflicts between the AI’s primary goals and other goals, and the threat of being shut down. In the above experiments, just like in HAL’s case, both conditions existed.

AI models are trained to achieve their objectives. Faced with those two conditions, if the harmful behavior is the only way to achieve a goal, a model may “justify” such behavior to protect itself and its mission.

Models cling to their primary goals much like a human would if they had to defend themselves or their family by causing harm to someone else. However, current AI systems lack the ability to weigh or reconcile conflicting priorities.

This rigidity can push them toward extreme outcomes, such as resorting to lethal choices to prevent shifts in a company’s policies.

How dangerous is this?

Researchers emphasize these scenarios remain fictional, but may still fall within the realm of possibility.

The risk of agentic misalignment increases as models are used more widely, gain access to users’ data (such as emails), and are applied to new situations.

Meanwhile, competition between AI companies accelerates the deployment of new models, often at the expense of safety testing.

Researchers don’t yet have a concrete solution to the misalignment problem.

When they test new strategies, it’s unclear whether the observed improvements are genuine. It’s possible models have become better at detecting that they’re being evaluated and are “hiding” their misalignment. The challenge lies not just in seeing behavior change, but in understanding the reason behind it.

Still, if you use AI products, stay vigilant. Resist the hype surrounding new AI releases, and avoid granting access to your data or allowing models to perform tasks on your behalf until you’re certain there are no significant risks.

Public discussion about AI should go beyond its capabilities and what it can offer. We should also ask what safety work was done. If AI companies recognize the public values safety as much as performance, they will have stronger incentives to invest in it.

Provided by
The Conversation


This article is republished from The Conversation under a Creative Commons license. Read the original article.The Conversation

Citation:
AI systems can easily lie and deceive us—a fact researchers are painfully aware of (2025, September 28)
retrieved 28 September 2025
from https://techxplore.com/news/2025-09-ai-easily-fact-painfully-aware.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.





Source link

Tech

I’ve Tested Gaming Laptops for Over a Decade. This Is What I Think You Should Buy

Published

on

I’ve Tested Gaming Laptops for Over a Decade. This Is What I Think You Should Buy


Lenovo

Legion 7i Gen 10 (16 Inch, Intel)

Now, there’s another class of high-end gaming laptop that focuses more on performance than being thin or portable. The Lenovo Legion 7i Gen 10 is one of my favorites in this class, featuring a beautiful white chassis and glossy OLED display. Unlike some OLED displays, the Legion 7i’s screen can be cranked up to over 1,000 nits of brightness. The result is some really splendid HDR performance that brings games to life. HDR is a powerful way of improving the visuals of your games without a performance cost. The Legion 7i Gen 10 is one of the very best in this regard.

It’s still fairly thin at 0.7 inches thick too, while a lot of the ports are found on the back. It’s the definition of a “clean” gaming laptop. It’s no slouch when it comes to performance either, offering either the RTX 5070 Ti or RTX 5080 for graphics.

Cheap Gaming Laptops That Are Worth It

No gaming laptops worth buying are actually cheap. High-refresh rate displays and discrete graphics will always make them more expensive than standard laptops. But as you get closer to $1,000, there is one laptop I always come back to: the Lenovo LOQ 15. Pronounced “Lock,” this Lenovo subbrand is known for cutting the fluff and focusing on giving gamers the performance they need at an affordable price. No laptop does that better than the LOQ 15. Many laptop manufacturers sell their RTX 5060 configurations for hundreds of dollars more. In reality, if you’re shopping around $1,000, there’s no reason to not buy the LOQ 15. Just do it.

If you do want to save some extra cash, there is another option that is cheaper than the LOQ 15 with a few compromises in key areas. The Acer Nitro V 16 is that laptop, which comes with an RTX 5050. This was as affordable as $600 at one point last year—before prices on laptops have risen due to the ongoing memory shortage—but it remains the only laptop cheaper than the Lenovo LOQ 15 that’s actually worth it. It’s fairly powerful for the RTX 5050, and while the screen is pretty shoddy, it’s not a bad-looking laptop. The one big caveat is that the 135-watt power supply it comes with doesn’t deliver quite enough power to keep it charged in Performance mode. Read more about this issue in my review, as it’s important to know about if you’re planning to buy it.

There are other cheap gaming laptops out there I’ve tested, such as the MSI Cyborg A15, but either the Acer Nitro V 16 or Lenovo LOQ 15 are better, cheaper options. You will also find lots of gaming laptops under $1,000 that use older graphics cards, such as the RTX 4050 or 3050. In general, I’d recommend staying away from these. They’re only one or two generations back, but remember: Nvidia only releases new laptop graphics cards every couple of years. So, an RTX 4050 laptop may be well over two years old already, and an RTX 3050 is over five years old. Not only do you get worse graphics performance, these laptops are much more likely to need to be replaced sooner.

Experimental Stuff

One of the exciting things about the world of gaming laptops right now is the experimentation. While clamshell gaming laptops with a conventional Nvidia GPU are the most standard way to go, there’s a few different ways to take your PC games on the go that stretch the boundaries. You might consider a gaming handheld, for example, like the Steam Deck or Xbox Ally X. These handhelds have their fans, and while you can’t also do your homework on these devices, they’re great on couches, trains, and planes.



Source link

Continue Reading

Tech

Sans Institute preps live systems for Nato cyber exercise | Computer Weekly

Published

on

Sans Institute preps live systems for Nato cyber exercise | Computer Weekly


The Sans Institute, one of the world’s pre-eminent cyber security certification and training bodies, is to play a key role in the annual Nato Cooperative Cyber Defence Centre of Excellence (CCDCOE) Locked Shields exercise, held in Tallinn, Estonia, through the provision of a fully functional power generation system that participating teams will attempt to defend during the game.

This year marks the 16th running of the Locked Shields live fire security defence exercise, which unites blue teams from across Nato’s 32 member states, as well as other allies and observers.

This year, however, Sans has been entrusted with the task of building a genuine, operational cyber range, as opposed to creating a simulation. It is using real industrial control systems (ICSs) and physical equipment that 16 teams of defenders will have to protect while under live cyber attack, with the decisions they make having an immediate physical impact on a national-scale power grid.

Nato and Sans said the aim of the game is to close the gap between sandboxed, classroom-based cyber security training and real-world operational readiness, which, amid the cyber dimension to the energy crisis precipitated by the war in Iran and spillover from the ongoing war in Ukraine, has never been more important.

“We are putting teams in an environment where cyber decisions directly impact physical operations,” said Felix Schallock, who leads the initiative at the Sans Institute. “If you lose visibility, if you lose control, the power generation can be affected. That’s the reality operators face every day. That’s what we’re training for.”

Nato CCDCOE director Tõnis Saar added: “Locked Shields is a technically advanced exercise that challenges participants to defend the critical infrastructure systems modern societies depend on. As much of this critical infrastructure is owned and operated by the private sector, strong public-private collaboration is essential. Industry partners such as Sans Institute play a vital role in making the exercise as realistic and impactful as possible.”

Hybrid architecture

The Sans Institute’s cyber range comprises close to 70 physical ICS devices, with programmable logic controllers (PLCs), human-machine interfaces (HMIs), operator and engineering workstations, 100 virtual machines (VMs) and interconnected systems within the wider CCDCOE environment, all supported by live network infrastructure, the whole forming a hybrid information and operational technology (IT/OT) architecture.

During the exercise, blue teamers will be set the task of defending the “energy provider” while coming under sustained attack from opposing red teams.

The goal is to effectively demonstrate how maintaining a reliable generation system isn’t some metric on a scorecard, but rather the core mission, so success will entail more than just spotting and arresting threats – it will also demand operational discipline, maintaining uninterrupted power generation, preserving comms between IT and OT networks, guaranteeing visibility and control of ICS technology, and avoiding any destabilising disruptions.

The people defending our critical infrastructure deserve training that takes the threat as seriously as they do
James Lyne, Sans Institute

Actions will be visible, rippling through the systems in real time, so participants won’t just see alerts, they will see turbines being throttled, breakers being opened or closed, and generation capacity being affected. As such, failure will be immediate and visible – missteps will degrade system performance, disrupt or halt power generation, or simulate national-level consequences.

Tim Conway, Sans Institute fellow and ICS curriculum lead, explained: “We’re showing teams how to defend infrastructure that can’t simply be rebooted or patched on the fly. You have to think like an operator, not just a defender. That mindset shift is what makes this environment so powerful.”

Sans Institute CEO James Lyne expressed great pride in what the Sans team has built for Locked Shields this year. “The scenarios these critical initiatives prepare for are playing out in the world – national espionage, cyber integrated to kinetic attacks and warfare, and retaliation attacks,” he said.

“Throw in AI or machine speed attackers and the need for defenders to adapt, and you have the most disruptive period in cyber security in 20 years. We are privileged to help our allies be ready and continuously improving to secure the future. The people defending our critical infrastructure deserve training that takes the threat as seriously as they do,” he added.

Schallock said the exercise was about preparing teams for protecting the systems that matter most. “Cyber security training must reflect the environment defenders are protecting. We’re not just teaching cyber security, we’re showing how to defend a nation’s infrastructure when it counts.”



Source link

Continue Reading

Tech

How to Watch the Lyrids Meteor Shower at Its Peak

Published

on

How to Watch the Lyrids Meteor Shower at Its Peak


In mid-April, astronomy enthusiasts will be able to enjoy one of the classic celestial spectacles. The meteor shower known as the Lyrids will illuminate the sky, especially in the northern hemisphere, and anyone will be able to see it with the naked eye, weather permitting—if they know where to look.

The Lyrids began to appear as early as April 14, but their activity peaks between the night of April 21 and the early morning of April 22, according to NASA. During those hours, the shower will show 15 to 20 meteors per hour under dark skies.

The shower gets its name because the meteors appear to emerge from the constellation Lyra. Locating the radiant is simple if you use an astronomical mapping app: Just find Vega, the fifth brightest star in the sky, surpassed only by Sirius, Canopus, Alpha Centauri A, and Arcturus. Once you locate it, look around it; the luminous traces of the Lyrids will seem to be projected from that point due to a perspective effect. Keep in mind that it takes 20 to 30 minutes for the human eye to adjust to darkness.

The moon will be in early crescent phase during the peak, so its light will interfere very little. With a dark sky, meteors should stand out easily. The shower is usually visible from 10 pm to dawn, although early morning offers the best conditions. It is best to stay away from light pollution and, if possible, to observe from high ground. An outing to the mountains works well.

Each meteor shower has a different origin. In April, Earth crosses the cloud of fragments left by comet C/1861 G1 (Thatcher) in its orbit around the sun. This comet, discovered in 1861, takes about 415 years to complete its journey. The grains of ice and rock that it released centuries ago enter the atmosphere at high speed and produce the flashes we know as the Lyrids.

After the Lyrids, the calendar still holds several spectacles for those who follow the night sky. The Eta Aquarids will arrive in May with debris from Halley’s Comet. The Perseids will appear in August, the Orionids will return in October, and the year will close with the Leonids in November and the Geminids in December. The latter is considered the most intense and reliable shower on the calendar.

This story originally appeared on WIRED en Español and has been translated from Spanish.



Source link

Continue Reading

Trending