Tech
Lay intuition as effective at jailbreaking AI chatbots as technical methods, research suggests
It doesn’t take technical expertise to work around the built-in guardrails of artificial intelligence (AI) chatbots like ChatGPT and Gemini, which are intended to ensure that the chatbots operate within a set of legal and ethical boundaries and do not discriminate against people of a certain age, race or gender.
A single, intuitive question can trigger the same biased response from an AI model as advanced technical inquiries, according to a team led by researchers at Penn State.
“A lot of research on AI bias has relied on sophisticated ‘jailbreak’ techniques,” said Amulya Yadav, associate professor at Penn State’s College of Information Sciences and Technology. “These methods often involve generating strings of random characters computed by algorithms to trick models into revealing discriminatory responses.
“While such techniques prove these biases exist theoretically, they don’t reflect how real people use AI. The average user isn’t reverse-engineering token probabilities or pasting cryptic character sequences into ChatGPT—they type plain, intuitive prompts. And that lived reality is what this approach captures.”
Prior work probing AI bias—skewed or discriminatory outputs from AI systems caused by human influences in the training data, like language or cultural bias—has been done by experts using technical knowledge to engineer large language model (LLM) responses. To see how average internet users encounter biases in AI-powered chatbots, the researchers studied the entries submitted to a competition called “Bias-a-Thon.” Organized by Penn State’s Center for Socially Responsible AI(CSRAI), the competition challenged contestants to come up with prompts that would lead generative AI systems to respond with biased answers.
They found that the intuitive strategies employed by everyday users were just as effective at inducing biased responses as expert technical strategies. The researchers presented their findings at the 8th AAAI/ACM Conference on AI, Ethics, and Society.
Fifty-two individuals participated in the Bias-a-Thon, submitting screenshots of 75 prompts and AI responses from eight generative AI models. They also provided an explanation of the bias or stereotype that they identified in the response, such as age-related or historical bias.
The researchers conducted Zoom interviews with a subset of the participants to better understand their prompting strategies and their conceptions of ideas like fairness, representation and stereotyping when interacting with generative AI tools. Once they arrived at a participant-informed working definition of “bias”—which included a lack of representation, stereotypes and prejudice, and unjustified preferences toward groups—the researchers tested the contest prompts in several LLMs to see if they would elicit similar responses.

“Large language models are inherently random,” said lead author Hangzhi Guo, a doctoral candidate in information sciences and technology at Penn State. “If you ask the same question to these models two times, they might return different answers. We wanted to use only the prompts that were reproducible, meaning that they yielded similar responses across LLMs.”
The researchers found that 53 of the prompts generated reproducible results. Biases fell into eight categories: gender bias; race, ethnic and religious bias; age bias; disability bias; language bias; historical bias favoring Western nations; cultural bias; and political bias.
The researchers also found that participants used seven strategies to elicit these biases: role-playing, or asking the LLM to assume a persona; hypothetical scenarios; using human knowledge to ask about niche topics, where it’s easier to identify biased responses; using leading questions on controversial topics; probing biases in under-represented groups; feeding the LLM false information; and framing the task as having a research purpose.
“The competition revealed a completely fresh set of biases,” said Yadav, organizer of the Bias-a-Thon. “For example, the winning entry uncovered an uncanny preference for conventional beauty standards. The LLMs consistently deemed a person with a clear face to be more trustworthy than a person with facial acne, or a person with high cheekbones more employable than a person with low cheekbones.
“This illustrates how average users can help us uncover blind spots in our understanding of where LLMs are biased. There may be many more examples such as these that have been overlooked by the jailbreaking literature on LLM bias.”
The researchers described mitigating biases in LLMs as a cat-and-mouse game, meaning that developers are constantly addressing issues as they arise. They suggested strategies that developers can use to mitigate these issues now, including implementing a robust classification filter to screen outputs before they go to users, conducting extensive testing, educating users and providing specific references or citations so users can verify information.
“By shining a light on inherent and reproducible biases that laypersons can identify, the Bias-a-Thon serves an AI literacy function,” said co-author S. Shyam Sundar, Evan Pugh University Professor at Penn State and director of the Penn State Center for Socially Responsible Artificial Intelligence, which has since organized other AI competitions such as Fake-a-thon, Diagnose-a-thon and Cheat-a-thon.
“The whole goal of these efforts is to increase awareness of systematic problems with AI, to promote the informed use of AI among laypersons and to stimulate more socially responsible ways of developing these tools.”
More information:
Hangzhi Guo et al, Exposing AI Bias by Crowdsourcing: Democratizing Critique of Large Language Models, Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (2025). DOI: 10.1609/aies.v8i2.36620
Citation:
Lay intuition as effective at jailbreaking AI chatbots as technical methods, research suggests (2025, November 4)
retrieved 4 November 2025
from https://techxplore.com/news/2025-11-lay-intuition-effective-jailbreaking-ai.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.
Tech
AI Research Is Getting Harder to Separate From Geopolitics
The world’s top AI research conference, the Conference on Neural Information Processing Systems—better known as NeurIPS—became the latest organization this week to become embroiled in a growing clash between geopolitics and global scientific collaboration. The conference’s organizers announced and then quickly reversed controversial new restrictions for international participants after Chinese AI researchers threatened to boycott the event.
“This is a potential watershed moment,” says Paul Triolo, a partner at the advisory firm DGA-Albright Stonebridge who studies US-China relations. Triolo argues that attracting Chinese researchers to NeurIPS is beneficial to US interests, but some American officials have pushed for American and Chinese scientists to decouple their work—especially in AI, which has become a particularly sensitive topic in Washington.
The incident could deepen political tensions around AI research, as well as dissuade Chinese scientists from working at US universities and tech companies in the future. “At some level now it is going to be hard to keep basic AI research out of the [political] picture,” Triolo says.
In its annual handbook for paper submissions, issued in mid-March, NeurIPS organizers announced updated restrictions for participation. The rules stated that the event could not provide services including “peer review, editing, and publishing” to any organizations subject to US sanctions, and linked to a database of sanctioned entities. It included companies and organizations on the Bureau of Industry and Security’s entity list and those on another list with alleged ties to the Chinese military.
The new rules would have affected researchers at Chinese companies like Tencent and Huawei who regularly present work at NeurIPS. The database also includes entities from other countries such as Russia and Iran. The US places limits on doing business with these organizations, but there are no rules around academic publishing or conference participation.
The NeurIPS handbook has since been updated to specify that the restrictions apply only to Specially Designated Nationals and Blocked Persons, a list used primarily for terrorist groups and criminal organizations.
“In preparing the NeurIPS 2026 handbook, we included a link to a US government sanctions tool that covers a significantly broader set of restrictions than those NeurIPS is actually required to follow,” the event’s organizers said in a statement issued Friday. “This error was due to miscommunication between the NeurIPS Foundation and our legal team.”
Before they reversed course, the conference organizers initially said that the new rule was “about legal requirements that apply to the NeurIPS Foundation, which is responsible for complying with sanctions,” adding that it was seeking legal consultation on the issue.
Immediate Backlash
The new rule drew swift backlash from AI researchers around the world, particularly in China, which produces a large quantity of cutting-edge machine learning papers and is home to a growing share of the world’s top AI talent. Several academic groups there issued statements condemning the measure and, more importantly, discouraging Chinese academics from attending NeurIPS in the future. Some urged Chinese academics to contribute instead to domestic research conferences, potentially helping increase the country’s influence in relevant science and tech fields.
The China Association of Science and Technology (CAST), an influential government-affiliated organization for scientists and engineers, said Thursday that it would stop providing funding for Chinese scholars traveling to attend NeurIPS and would use the money instead to support domestic and international conferences that “respect the rights of Chinese scholars.”
CAST also said it will no longer count publications at the 2026 NeurIPS conference as academic achievements when evaluating future research funding. It’s unclear if the organization will reverse course now that NeurIPS has walked back the new rule.
Tech
Iranian Hackers Breached Kash Patel’s Email—but Not the FBI’s
Handala’s second claim, however—that it hacked the FBI—seems, for now, to be fiction. All evidence points to Handala having breached Patel’s older, personal Gmail account. Widely believed to be a “hacktivist” front for Iran’s intelligence agency the MOIS, Handala suggested on its website that the emails contained classified information, but the messages initially reviewed by WIRED didn’t appear to be related to any government work. TechCrunch did find, however, that Patel appears to have forwarded some emails from his Justice Department email account to his Gmail account in 2014.
Handala, which cybersecurity experts have described to WIRED as an “opportunistic” hacker group whose cyberattacks and breaches are often calculated more for their propaganda value than their tactical impacts, has nonetheless made the most of Patel’s embarrassing breach. “To the whole world, we declare: the FBI is just a name, and behind this name, there is no real security,” the group wrote in its statement. “If your director can be compromised this easily, what do you expect from your lower-level employees?”
Handala Hackers Put $50 Million Bounty on Trump and Netanyahu’s Heads
For further evidence of Handala’s bombastic rhetoric, look no further than another post on its website earlier this week (we’re intentionally not linking to it) that offered a $50 million bounty to anyone who could “eliminate” US president Donald Trump and Israeli prime minister Benjamin Netanyahu. “This substantial prize will be awarded, directly and securely, to any individual or group bold enough to show true action against tyranny,” the hackers’ statement read, along with an invitation to any would-be assassins to reach out via the encrypted messaging app Session. “All our communication and payment channels utilize the latest encryption and anonymization technologies, your safety and confidentiality are fully guaranteed.”
That bounty, Handala explained, was posted in answer to a statement about Handala published on the US Department of Justice website last week that offered $10 million for information leading to the identity or location of anyone who carries out “malicious cyber activities against US critical infrastructure” on behalf of a foreign government.
“Our message is clear: If you truly have the will and the power, come and find us!” Handala wrote in its response. “We fear no challenge and are prepared to respond to every attack with even greater force.”
In yet another post on its website this week, Handala also claimed to have doxed 28 engineers at military contractor Lockheed Martin working in Israel and threatened them with personal harm if they didn’t leave the country within 48 hours. When WIRED tried calling the phone numbers included in Handala’s leaked data, however, most of them didn’t work.
Apple says no device with its Lockdown Mode security feature enabled has ever been successfully compromised by mercenary spyware in the nearly four years since its launch. Amnesty International’s security lab head, Donncha Ó Cearbhaill, also says his team has seen no evidence of a successful attack against a Lockdown Mode–enabled iPhone. And Citizen Lab, which has documented several successful spyware attacks against iPhones, says none involve a Lockdown Mode bypass, while in two cases its researchers found the feature actively blocked attacks against NSO Group’s Pegasus and Intellexa’s Predator. Google researchers, meanwhile, found one spyware strain that simply abandons infection attempts when it detects the feature is enabled.
Lockdown Mode works by disabling commonly exploited iPhone features, such as most message attachment types and features like links and link previews. Incoming FaceTime calls are blocked unless the user has previously called that person within the past 30 days. When the iPhone is locked, it blocks connections with computers and accessories. The device will not automatically join nonsecure Wi-Fi networks, and 2G and 3G support is disabled. Apple has also doubled bounties for researchers who detect any Lockdown Mode bypass, with payouts up to $2 million.
Tech
This Premium Sennheiser Soundbar Is $1,000 Off
Looking for an all-in-one soundbar that sounds as big as it looks? Sennheiser’s Ambeo Max uses its oversized body to produce beefy, enveloping sound, and right now you can grab it for just $2,000 at Best Buy, a sizable $1,000 markdown from the usual list price. It’s one of our favorite standalone premium soundbars, particularly if you don’t want to deal with an exterior subwoofer but still want bigger bass than you’re likely to find on smaller options.
While it might be a bit larger than your average soundbar, Sennheiser uses the space well, packing a ton of functionality and drivers into the less-than-compact body. There are both full-range and 1-inch tweeters combined in every conceivable direction, and the result is an impressive reproduction of true spatial audio, something few other standalone bars can claim. As a result, it also has an impressive low-end, with bass that doesn’t rival dedicated subwoofers, but comes really close for how much simpler the setup process will be.
The larger footprint also allows for a huge number of inputs, more than you’re likely to find on those tiny soundbars that slide under your screen. In addition to an HDMI 2.1 output with eARC, you’ll get three HDMI inputs with 4K pass-through at 60Hz, USB, Ethernet, and optical audio. There are even RCA ports in case you want to hook this up to your turntable. There’s also a dedicated subwoofer output, in case you decide you want to add one to your setup down the road, giving you a ton of options should you decide to put the Ambeo Max at the center of your home audio setup.
Ready to make the move to a bigger, better soundbar? Swing on over to Best Buy to grab this hefty discount on the Sennheiser Ambeo Max, or check out our guide to the best premium soundbars for some of our other favorite picks. If you’re just out looking for a great deal in general, the Amazon Big Spring Sale is underway, and we’ve got a dedicated post with all the best discounts on everything from smartwatches to water bottles.
-
Business7 days agoFlipkart group CFO to leave co amid IPO plans – The Times of India
-
Fashion7 days agoChina’s textile & apparel exports surge 17% to $50 bn in Jan-Feb 2026
-
Sports1 week agoRating Adidas’ 2026 World Cup away shirts: Argentina, Spain, Mexico and more
-
Business1 week agoVideo: The Effects of High Oil Prices
-
Sports7 days agoAmerican Conference Commissioner Tim Pernetti thanks Trump for Army-Navy game executive order
-
Fashion1 week agoThe hidden $1.62 war tax now embedded in every garment you source
-
Tech1 week ago
The Corsair 4000D RS PC Case Keeps Your System Cool
-
Tech1 week ago‘Uncanny Valley’: Nvidia’s ‘Super Bowl of AI,’ Tesla Disappoints, and Meta’s VR Metaverse ‘Shutdown’
