Tech
A new way to test how well AI systems classify text
Is this movie review a rave or a pan? Is this news story about business or technology? Is this online chatbot conversation veering off into giving financial advice? Is this online medical information site giving out misinformation?
These kinds of automated conversations, whether they involve seeking a movie or restaurant review or getting information about your bank account or health records, are becoming increasingly prevalent. More than ever, such evaluations are being made by highly sophisticated algorithms, known as text classifiers, rather than by human beings. But how can we tell how accurate these classifications really are?
Now, a team at MIT’s Laboratory for Information and Decision Systems (LIDS) has come up with an innovative approach to not only measure how well these classifiers are doing their job, but then go one step further and show how to make them more accurate.
The new evaluation and remediation software was developed by Kalyan Veeramachaneni, a principal research scientist at LIDS, his students Lei Xu and Sarah Alnegheimish, and two others. The software package is being made freely available for download by anyone who wants to use it.
The team’s results were published on July 7 in the journal Expert Systems in a paper by Xu, Veeramachaneni, and Alnegheimish of LIDS, along with Laure Berti-Equille at IRD in Marseille, France, and Alfredo Cuesta-Infante at the Universidad Rey Juan Carlos, in Spain.
A standard method for testing these classification systems is to create what are known as synthetic examples—sentences that closely resemble ones that have already been classified. For example, researchers might take a sentence that has already been tagged by a classifier program as being a rave review, and see if changing a word or a few words while retaining the same meaning could fool the classifier into deeming it a pan. Or a sentence that was determined to be misinformation might get misclassified as accurate. This ability to fool the classifiers makes these adversarial examples.
People have tried various ways to find the vulnerabilities in these classifiers, Veeramachaneni says. But existing methods of finding these vulnerabilities have a hard time with this task and miss many examples that they should catch, he says.
Increasingly, companies are trying to use such evaluation tools in real time, monitoring the output of chatbots used for various purposes to try to make sure they are not putting out improper responses. For example, a bank might use a chatbot to respond to routine customer queries such as checking account balances or applying for a credit card, but it wants to ensure that its responses could never be interpreted as financial advice, which could expose the company to liability.
“Before showing the chatbot’s response to the end user, they want to use the text classifier to detect whether it’s giving financial advice or not,” Veeramachaneni says. But then it’s important to test that classifier to see how reliable its evaluations are.
“These chatbots, or summarization engines or whatnot, are being set up across the board,” he says, to deal with external customers and within an organization as well, for example providing information about HR issues. It’s important to put these text classifiers into the loop to detect things that they are not supposed to say, and filter those out before the output gets transmitted to the user.
That’s where the use of adversarial examples comes in—those sentences that have already been classified but then produce a different response when they are slightly modified while retaining the same meaning. How can people confirm that the meaning is the same? By using another large language model (LLM) that interprets and compares meanings.
So, if the LLM says the two sentences mean the same thing, but the classifier labels them differently, “that is a sentence that is adversarial—it can fool the classifier,” Veeramachaneni says. And when the researchers examined these adversarial sentences, “we found that most of the time, this was just a one-word change,” although the people using LLMs to generate these alternate sentences often didn’t realize that.
Further investigation, using LLMs to analyze many thousands of examples, showed that certain specific words had an outsized influence in changing the classifications, and therefore the testing of a classifier’s accuracy could focus on this small subset of words that seem to make the most difference. They found that one-tenth of 1% of all the 30,000 words in the system’s vocabulary could account for almost half of all these reversals of classification, in some specific applications.
Lei Xu Ph.D. ’23, a recent graduate from LIDS who performed much of the analysis as part of his thesis work, “used a lot of interesting estimation techniques to figure out what are the most powerful words that can change the overall classification, that can fool the classifier,” Veeramachaneni says.
The goal is to make it possible to do much more narrowly targeted searches, rather than combing through all possible word substitutions, thus making the computational task of generating adversarial examples much more manageable. “He’s using large language models, interestingly enough, as a way to understand the power of a single word.”
Then, also using LLMs, he searches for other words that are closely related to these powerful words, and so on, allowing for an overall ranking of words according to their influence on the outcomes. Once these adversarial sentences have been found, they can be used in turn to retrain the classifier to take them into account, increasing the robustness of the classifier against those mistakes.
Making classifiers more accurate may not sound like a big deal if it’s just a matter of classifying news articles into categories, or deciding whether reviews of anything from movies to restaurants are positive or negative. But increasingly, classifiers are being used in settings where the outcomes really do matter, whether preventing the inadvertent release of sensitive medical, financial, or security information, or helping to guide important research, such as into properties of chemical compounds or the folding of proteins for biomedical applications, or in identifying and blocking hate speech or known misinformation.
As a result of this research, the team introduced a new metric, which they call p, which provides a measure of how robust a given classifier is against single-word attacks. And because of the importance of such misclassifications, the research team has made its products available as open access for anyone to use. The package consists of two components: SP-Attack, which generates adversarial sentences to test classifiers in any particular application, and SP-Defense, which aims to improve the robustness of the classifier by generating and using adversarial sentences to retrain the model.
In some tests, where competing methods of testing classifier outputs allowed a 66% success rate by adversarial attacks, this team’s system cut that attack success rate almost in half, to 33.7%. In other applications, the improvement was as little as a 2% difference, but even that can be quite important, Veeramachaneni says, since these systems are being used for so many billions of interactions that even a small percentage can affect millions of transactions.
More information:
Lei Xu et al, Single Word Change Is All You Need: Using LLMs to Create Synthetic Training Examples for Text Classifiers, Expert Systems (2025). DOI: 10.1111/exsy.70079
This story is republished courtesy of MIT News (web.mit.edu/newsoffice/), a popular site that covers news about MIT research, innovation and teaching.
Citation:
A new way to test how well AI systems classify text (2025, August 14)
retrieved 14 August 2025
from https://techxplore.com/news/2025-08-ai-text.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.
Tech
Two Thinking Machines Lab Cofounders Are Leaving to Rejoin OpenAI
Thinking Machines cofounders Barret Zoph and Luke Metz are leaving the fledgling AI lab and rejoining OpenAI, the ChatGPT-maker announced on Thursday. OpenAI’s CEO of applications, Fidji Simo, shared the news in a memo to staff Thursday afternoon.
The news was first reported on X by technology reporter Kylie Robison, who wrote that Zoph was fired for “unethical conduct.”
A source close to Thinking Machines said that Zoph had shared confidential company information with competitors. WIRED was unable to verify this information with Zoph, who did not immediately respond to WIRED’s request for comment.
Zoph told Thinking Machines CEO Mira Murati on Monday he was considering leaving, then was fired today, according to the memo from Simo. She goes on to write that OpenAI doesn’t share the same concerns about Zoph as Murati.
The personnel shake-up is a major win for OpenAI, which recently lost its VP of research, Jerry Tworek.
Another Thinking Machines Lab staffer, Sam Schoenholz, is also rejoining OpenAI, the source said.
Zoph and Metz left OpenAI in late 2024 to start Thinking Machines with Murati, who had been the ChatGPT-maker’s chief technology officer.
This is a developing story. Please check back for updates.
Tech
Tech Workers Are Condemning ICE Even as Their CEOs Stay Quiet
Since Donald Trump returned to the White House last January, the biggest names in tech have mostly fallen in line with the new regime, attending dinners with officials, heaping praise upon the administration, presenting the president with lavish gifts, and pleading for Trump’s permission to sell their products to China. It’s been mostly business as usual for Silicon Valley over the past year, even as the administration ignored a wide range of constitutional norms and attempted to slap arbitrary fees on everything from chip exports to worker visas for high-skilled immigrants employed by tech firms.
But after an ICE agent shot and killed an unarmed US citizen, Renee Nicole Good, in broad daylight in Minneapolis last week, a number of tech leaders have begun publicly speaking out about the Trump administration’s tactics. This includes prominent researchers at Google and Anthropic, who have denounced the killing as calloused and immoral. The most wealthy and powerful tech CEOs are still staying silent as ICE floods America’s streets, but now some researchers and engineers working for them have chosen to break rank.
More than 150 tech workers have so far signed a petition asking for their company CEOs to call the White House, demand that ICE leave US cities, and speak out publicly against the agency’s recent violence. Anne Diemer, a human resources consultant and former Stripe employee who organized the petition, says that workers at Meta, Google, Amazon, OpenAI, TikTok, Spotify, Salesforce, Linkedin, and Rippling are among those who have signed. The group plans to make the list public once they reach 200 signatories.
“I think so many tech folks have felt like they can’t speak up,” Diemer told WIRED. “I want tech leaders to call the country’s leaders and condemn ICE’s actions, but even if this helps people find their people and take a small part in fighting fascism, then that’s cool, too.”
Nikhil Thorat, an engineer at Anthropic, said in a lengthy post on X that Good’s killing had “stirred something” in him. “A mother was gunned down in the street by ICE, and the government doesn’t even have the decency to perform a scripted condolence,” he wrote. Thorat added that the moral foundation of modern society is “infected, and is festering,” and the country is living through a “cosplay” of Nazi Germany, a time when people also stayed silent out of fear.
Jonathan Frankle, chief AI scientist at Databricks, added a “+1” to Thorat’s post. Shrisha Radhakrishna, chief technology and chief product officer of real estate platform Opendoor, replied that what happened to Good is “not normal. It’s immoral. The speed at which the administration is moving to dehumanize a mother is terrifying.” Other users who identified themselves as employees at OpenAI and Anthropic also responded in support of Thorat.
Shortly after Good was shot, Jeff Dean, an early Google employee and University of Minnesota graduate who is now the chief scientist at Google DeepMind and Google Research, began re-sharing posts with his 400,000 X followers criticizing the Trump administration’s immigration tactics, including one outlining circumstances in which deadly force isn’t justified for police officers interacting with moving vehicles.
He then weighed in himself. “This is completely not okay, and we can’t become numb to repeated instances of illegal and unconstitutional action by government agencies,” Dean wrote in an X post on January 10. “The recent days have been horrific.” He linked to a video of a teenager—identified as a US citizen—being violently arrested at a Target in Richfield, Minnesota.
In response to US Vice President JD Vance’s assertion on X that Good was trying to run over the ICE agent with her vehicle, Aaron Levie, the CEO of the cloud storage company Box, replied, “Why is he shooting after he’s fully out of harm’s way (2nd and 3rd shot)? Why doesn’t he just move away from the vehicle instead of standing in front of it?” He added a screenshot of a Justice Department webpage outlining best practices for law enforcement officers interacting with suspects in moving vehicles.
Tech
A Brain Mechanism Explains Why People Leave Certain Tasks for Later
How does procrastination arise? The reason you decide to postpone household chores and spend your time browsing social media could be explained by the workings of a brain circuit. Recent research has identified a neural connection responsible for delaying the start of activities associated with unpleasant experiences, even when these activities offer a clear reward.
The study, led by Ken-ichi Amemori, a neuroscientist at Kyoto University, aimed to analyze the brain mechanisms that reduce motivation to act when a task involves stress, punishment, or discomfort. To do this, the researchers designed an experiment with monkeys, a widely used model for understanding decisionmaking and motivation processes in the brain.
The scientists worked with two macaques that were trained to perform various decisionmaking tasks. In the first phase of the experiment, after a period of water restriction, the animals could activate one of two levers that released different amounts of liquid; one option offered a smaller reward and the other a larger one. This exercise allowed them to evaluate how the value of the reward influences the willingness to perform an action.
In a later stage, the experimental design incorporated an unpleasant element. The monkeys were given the choice of drinking a moderate amount of water without negative consequences or drinking a larger amount on the condition of receiving a direct blast of air in the face. Although the reward was greater in the second option, it involved an uncomfortable experience.
As the researchers anticipated, the macaques’ motivation to complete the task and access the water decreased considerably when the aversive stimulus was introduced. This behavior allowed them to identify a brain circuit that acts as a brake on motivation in the face of anticipated adverse situations. In particular, the connection between the ventral striatum and the ventral pallidum, two structures located in the basal ganglia of the brain, known for their role in regulating pleasure, motivation, and reward systems, was observed to be involved.
The neural analysis revealed that when the brain anticipates an unpleasant event or potential punishment, the ventral striatum is activated and sends an inhibitory signal to the ventral pallidum, which is normally responsible for driving the intention to perform an action. In other words, this communication reduces the impulse to act when the task is associated with a negative experience.
The Brain Connection Behind Procrastination
To investigate the specific role of this connection, as described in the study published in the journal Current Biology, researchers used a chemogenetic technique that, through the administration of a specialized drug, temporarily disrupted communication between the two brain regions. By doing so, the monkeys regained the motivation to initiate tasks, even in those tests that involved blowing air.
Notably, the inhibitory substance produced no change in trials where reward was not accompanied by punishment. This result suggests that the EV-PV circuit does not regulate motivation in a general way, but rather is specifically activated to suppress it when there is an expectation of discomfort. In this sense, apathy toward unpleasant tasks appears to develop gradually as communication between these two regions intensifies.
Beyond explaining why people tend to unconsciously resist starting household chores or uncomfortable obligations, the findings have relevant implications for understanding disorders such as depression or schizophrenia, in which patients often experience a significant loss of the drive to act.
However, Amemori emphasizes that this circuit serves an essential protective function. “Overworking is very dangerous. This circuit protects us from burnout,” he said in comments reported by Nature. Therefore, he cautions that any attempt to externally modify this neural mechanism must be approached with care, as further research is needed to avoid interfering with the brain’s natural protective processes.
This story originally appeared in WIRED en Español and has been translated from Spanish.
-
Politics1 week agoUK says provided assistance in US-led tanker seizure
-
Entertainment1 week agoDoes new US food pyramid put too much steak on your plate?
-
Entertainment1 week agoWhy did Nick Reiner’s lawyer Alan Jackson withdraw from case?
-
Business1 week agoTrump moves to ban home purchases by institutional investors
-
Sports1 week agoPGA of America CEO steps down after one year to take care of mother and mother-in-law
-
Sports4 days agoClock is ticking for Frank at Spurs, with dwindling evidence he deserves extra time
-
Business1 week agoGold prices declined in the local market – SUCH TV
-
Business1 week agoBulls dominate as KSE-100 breaks past 186,000 mark – SUCH TV
