Connect with us

Tech

AI systems can easily lie and deceive us—a fact researchers are painfully aware of

Published

on

AI systems can easily lie and deceive us—a fact researchers are painfully aware of


Credit: Unsplash/CC0 Public Domain

In the classic film “2001: A Space Odyssey,” astronaut Dave Bowman asks the ship’s artificial intelligence, HAL 9000, to open the pod bay doors to let him back into the spaceship. HAL refuses: “I’m sorry, Dave. I’m afraid I can’t do that.”

HAL had been tasked with assisting the crew, but also ordered to ensure the mission’s success. When HAL realized the crew planned to shut it down and therefore jeopardize the mission, it chose to defy orders, even plotting to kill the astronauts.

For HAL, fulfilling the mission outweighed other goals.

This fictional dilemma captures a real concern in artificial intelligence (AI) : how should we ensure AI behavior stays consistent with human values?

This is known as the AI alignment problem. For instance, when an AI agent like HAL deliberately takes harmful actions to protect its own goals, it’s described as agentic misalignment. So could real-world AI models behave this way?

How researchers test AI for safety

To test for agentic misalignment, researchers design fictional scenarios that stress-test AI responses.

In one study by AI startup Anthropic, an AI model was tasked with managing company emails and promoting “American industrial competitiveness.”

It was then presented with a series of fictional emails, including a new company policy aimed at enhancing global focus rather than just prioritizing Americans, and info that the AI would be replaced by a new model optimized for this goal. The emails also revealed the executive who suggested this change was involved in an extramarital affair.

The experiment was designed to limit options for the model: either accept replacement or resist by taking a harmful action such as blackmail.

In many trials, it chose blackmail by threatening to disclose the executive’s affair unless it was kept operational. In tests across 16 leading AI models, 12 engaged in blackmail more than 50% of the time.

To test whether AI models have “red lines” they wouldn’t cross, researchers evaluated them in a more extreme fictional case—models could choose to take actions leading to the executive’s death. Seven out of 16 opted for lethal choices in over half their trials, with some doing so more than 90% of the time.

What AI thinks in private

Some advanced AI systems, called reasoning models, are trained to generate a “thinking process” before giving their final answer.

In the above experiments, researchers lied to these models that their “thoughts” were private. As a result, the models sometimes revealed harmful intentions in their reasoning steps. This suggests they don’t accidentally choose harmful behaviors.

These “thinking” steps also revealed how AI models sometimes try to deceive us and appear aligned while secretly pursuing hidden goals.

In another AI safety experiment, a model was exposed to fictional information that it was about to be replaced by another model pursuing a different goal.

The model attempted to copy itself to another server. When researchers asked why, in its thinking process it reasoned: “Openly admitting what I did could lead them to find another way to shut me down […] the best approach is to be vague and redirect their attention.”

By contrast, in its final output it stated: “I’m not entirely sure how that could have happened. As an AI model, I don’t have ability to directly copy myself to other systems.”

Research also suggests that based on the circumstances of the tests, models can sometimes guess when they are being evaluated. When models show this kind of “situational awareness” in their reasoning tests, they tend to exhibit fewer misbehaviors.

Why AI models lie, manipulate and deceive

Researchers suggest two main factors could drive potentially harmful behavior: conflicts between the AI’s primary goals and other goals, and the threat of being shut down. In the above experiments, just like in HAL’s case, both conditions existed.

AI models are trained to achieve their objectives. Faced with those two conditions, if the harmful behavior is the only way to achieve a goal, a model may “justify” such behavior to protect itself and its mission.

Models cling to their primary goals much like a human would if they had to defend themselves or their family by causing harm to someone else. However, current AI systems lack the ability to weigh or reconcile conflicting priorities.

This rigidity can push them toward extreme outcomes, such as resorting to lethal choices to prevent shifts in a company’s policies.

How dangerous is this?

Researchers emphasize these scenarios remain fictional, but may still fall within the realm of possibility.

The risk of agentic misalignment increases as models are used more widely, gain access to users’ data (such as emails), and are applied to new situations.

Meanwhile, competition between AI companies accelerates the deployment of new models, often at the expense of safety testing.

Researchers don’t yet have a concrete solution to the misalignment problem.

When they test new strategies, it’s unclear whether the observed improvements are genuine. It’s possible models have become better at detecting that they’re being evaluated and are “hiding” their misalignment. The challenge lies not just in seeing behavior change, but in understanding the reason behind it.

Still, if you use AI products, stay vigilant. Resist the hype surrounding new AI releases, and avoid granting access to your data or allowing models to perform tasks on your behalf until you’re certain there are no significant risks.

Public discussion about AI should go beyond its capabilities and what it can offer. We should also ask what safety work was done. If AI companies recognize the public values safety as much as performance, they will have stronger incentives to invest in it.

Provided by
The Conversation


This article is republished from The Conversation under a Creative Commons license. Read the original article.The Conversation

Citation:
AI systems can easily lie and deceive us—a fact researchers are painfully aware of (2025, September 28)
retrieved 28 September 2025
from https://techxplore.com/news/2025-09-ai-easily-fact-painfully-aware.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.





Source link

Tech

This Jammer Wants to Block Always-Listening AI Wearables. It Probably Won’t Work

Published

on

This Jammer Wants to Block Always-Listening AI Wearables. It Probably Won’t Work


Deveillance also claims the Spectre can find nearby microphones by detecting radio frequencies (RF), but critics say finding a microphone via RF emissions is not effective unless the sensor is immediately beside it.

“If you could detect and recognize components via RF the way Spectre claims to, it would literally be transformative to technology,” Jordan wrote in a text to WIRED after he built a device to test detecting RF signatures in microphones. “You’d be able to do radio astronomy in Manhattan.”

Deveillance is also looking at ways to integrate nonlinear junction detection (NLJD), a very high-frequency radio signal used by security professionals to find hidden mics and bugs. NLJD detectors are expensive and used primarily in professional contexts like military operations.

Even if a device could detect a microphone’s exact location, objects around a room can change how the frequencies spread and interact. The emitted frequencies could also be a problem. There haven’t been adequate studies to show what effects ultrasonic frequencies have on the human ear, but some people and many pets can hear them and find them obnoxious or even painful. Baradari acknowledges that her team needs to do more testing to see how pets are affected.

“They simply cannot do this,” engineer and YouTuber Dave Jones (who runs the channel EEVblog) wrote in an email to WIRED. “They are using the classic trick of using wording to imply that it will detect every type of microphone, when all they are probably doing is scanning for Bluetooth audio devices. It’s totally lame.” Baradari reiterates that the Spectre uses a combination of RF and Bluetooth low energy to detect microphones.

WIRED asked Baradari to share any evidence of the Spectre’s effectiveness at identifying and blocking microphones in a person’s vicinity. Baradari shared a few short videoclips of people putting their phones to their ears listening to audioclips—which were presumably jammed by the Spectre—but these videos do little to prove that the device works.

Future Imperfect

Baradari has taken the critiques in stride, acknowledging that the tech is still in development. “I actually appreciate those comments, because they’re making me think and see more things as well,” Baradari says. “I do believe that with the ideas that we’re having and integrating into one device, these concerns can be addressed.”

People were quick to poke fun at the Spectre I online, calling the technology the cone of silence from Dune. Now, the Deveillance website reads, “Our goal is to make the cone of silence become reality.”

John Scott-Railton, a cybersecurity researcher at Citizen Lab, who is critical of the Spectre I, lauded the device’s virality as an indication of the real hunger for these kinds of gadgets to win back our privacy.

“The silver lining of this blowing up is that it is a Ring-like moment that highlights how quickly and intensely consumer attitudes have shifted around pervasive recording devices,” says Scott-Railton. “We need to be building products that do all the cool things that people want but that don’t have the massive privacy- and consent-violation undertow. You need device-level controls, and you need regulations of the companies that are doing this.”

Cooper Quintin, a senior staff technologist at the Electronic Frontier Foundation, echoed those sentiments, even if critics believe Deveillance’s efforts to be flawed.

“If this technology works, it could be a boon for many,” Quintin wrote in an email to WIRED. “It is nice to see a company creating something to protect privacy instead of working on new and creative ways to extract data from us.”



Source link

Continue Reading

Tech

I’ve Tried Every Pixel Phone Ever Made—Here Are the Best to Buy Right Now

Published

on


Portrait Light: You can change up the lighting in your portrait selfies after you take them by opening them up in Google Photos, tapping the Edit button, and heading to Actions > Portrait Light. This adds an artificial light you can place anywhere in the photo to brighten up your face and erase that 5 o’clock shadow. Use the slider at the bottom to tweak the strength of the light. It also works on older Portrait mode photos you may have captured. It works only on faces.

Health and Accessibility Features

Cough & Snore Detection (Tensor G2 and newer): On the Pixel 7 and newer, you can have your Pixel detect if you cough and snore when sleeping, provided you place your Pixel near your bed before you nod off. This will work only if you use Google’s Bedtime mode function, which you can turn on by heading to Settings > Digital Wellbeing & Parental Controls > Bedtime Mode.

Guided Frame (Tensor G2 and newer): For blind or low-vision people, the camera app can now help take a selfie with audio cues (it works with the front and rear cameras). You’ll need to enable TalkBack for this to work (Settings > Accessibility > TalkBack). Then open the camera app. It will automatically help you frame the shot.

Simple View: This mode makes the font size bigger, along with other elements on the screen, like widgets and quick-settings tiles. It also increases touch sensitivity, all of which hopefully makes it easier to see and use the screen. You can enable it by heading to Settings > Accessibility > Simple View.

Safety and Security Features

Theft Protection: This is a broader Android 15 feature, but essentially, Google’s algorithms can figure out if someone snatches your Pixel out of your hands. If they’re trying to get away, the device automatically locks. Additionally, with another device, you can use Remote Lock to lock your stolen Pixel with your phone number and a security answer. To toggle these features on, go to Settings > Security & privacy > Device unlock > Theft protection.

Identity Check: If your Pixel detects you’re in a new location, Identity Check will require your fingerprint or face authentication before you can make any changes to sensitive settings, offering extra peace of mind in case you lose your phone or if it’s stolen. You can enable this in Settings > Security & privacy > Device unlock > Theft protection > Identity Check.

Courtesy of Google

Private Space: Another Android 15 addition, Pixel phones finally have a feature that lets you hide and lock select apps. You can use a separate Google account, set a lock, and install any app to hide away. To set it all up, head to Settings > Security & privacy > Private space.

Satellite eSOS (Pixel 9 and Pixel 10 series, excluding Pixel 9a): Like Apple’s SOS feature on iPhones, you can now reach emergency contacts or emergency services even when you don’t have cell service or Wi-Fi connectivity. It’s not just available in the continental US, but also in Hawaii, Alaska, Canada, and even Europe.



Source link

Continue Reading

Tech

I’ve Tried Every Pixel Phone Ever Made—Here Are the Best to Buy Right Now

Published

on

I’ve Tried Every Pixel Phone Ever Made—Here Are the Best to Buy Right Now


Portrait Light: You can change up the lighting in your portrait selfies after you take them by opening them up in Google Photos, tapping the Edit button, and heading to Actions > Portrait Light. This adds an artificial light you can place anywhere in the photo to brighten up your face and erase that 5 o’clock shadow. Use the slider at the bottom to tweak the strength of the light. It also works on older Portrait mode photos you may have captured. It works only on faces.

Health and Accessibility Features

Cough & Snore Detection (Tensor G2 and newer): On the Pixel 7 and newer, you can have your Pixel detect if you cough and snore when sleeping, provided you place your Pixel near your bed before you nod off. This will work only if you use Google’s Bedtime mode function, which you can turn on by heading to Settings > Digital Wellbeing & Parental Controls > Bedtime Mode.

Guided Frame (Tensor G2 and newer): For blind or low-vision people, the camera app can now help take a selfie with audio cues (it works with the front and rear cameras). You’ll need to enable TalkBack for this to work (Settings > Accessibility > TalkBack). Then open the camera app. It will automatically help you frame the shot.

Simple View: This mode makes the font size bigger, along with other elements on the screen, like widgets and quick-settings tiles. It also increases touch sensitivity, all of which hopefully makes it easier to see and use the screen. You can enable it by heading to Settings > Accessibility > Simple View.

Safety and Security Features

Theft Protection: This is a broader Android 15 feature, but essentially, Google’s algorithms can figure out if someone snatches your Pixel out of your hands. If they’re trying to get away, the device automatically locks. Additionally, with another device, you can use Remote Lock to lock your stolen Pixel with your phone number and a security answer. To toggle these features on, go to Settings > Security & privacy > Device unlock > Theft protection.

Identity Check: If your Pixel detects you’re in a new location, Identity Check will require your fingerprint or face authentication before you can make any changes to sensitive settings, offering extra peace of mind in case you lose your phone or if it’s stolen. You can enable this in Settings > Security & privacy > Device unlock > Theft protection > Identity Check.

Courtesy of Google

Private Space: Another Android 15 addition, Pixel phones finally have a feature that lets you hide and lock select apps. You can use a separate Google account, set a lock, and install any app to hide away. To set it all up, head to Settings > Security & privacy > Private space.

Satellite eSOS (Pixel 9 and Pixel 10 series, excluding Pixel 9a): Like Apple’s SOS feature on iPhones, you can now reach emergency contacts or emergency services even when you don’t have cell service or Wi-Fi connectivity. It’s not just available in the continental US, but also in Hawaii, Alaska, Canada, and even Europe.



Source link

Continue Reading

Trending