Connect with us

Tech

A new way to test how well AI systems classify text

Published

on

A new way to test how well AI systems classify text


SP-Attack pipeline. High-flip-capacity words are used to conduct low-cost single-word adversarial attacks. Credit: Expert Systems (2025). DOI: 10.1111/exsy.70079

Is this movie review a rave or a pan? Is this news story about business or technology? Is this online chatbot conversation veering off into giving financial advice? Is this online medical information site giving out misinformation?

These kinds of automated conversations, whether they involve seeking a movie or restaurant review or getting information about your or health records, are becoming increasingly prevalent. More than ever, such evaluations are being made by highly sophisticated algorithms, known as text classifiers, rather than by human beings. But how can we tell how accurate these classifications really are?

Now, a team at MIT’s Laboratory for Information and Decision Systems (LIDS) has come up with an innovative approach to not only measure how well these classifiers are doing their job, but then go one step further and show how to make them more accurate.

The new evaluation and remediation software was developed by Kalyan Veeramachaneni, a principal research scientist at LIDS, his students Lei Xu and Sarah Alnegheimish, and two others. The is being made freely available for download by anyone who wants to use it.

The team’s results were published on July 7 in the journal Expert Systems in a paper by Xu, Veeramachaneni, and Alnegheimish of LIDS, along with Laure Berti-Equille at IRD in Marseille, France, and Alfredo Cuesta-Infante at the Universidad Rey Juan Carlos, in Spain.

A standard method for testing these classification systems is to create what are known as synthetic examples—sentences that closely resemble ones that have already been classified. For example, researchers might take a sentence that has already been tagged by a classifier program as being a rave review, and see if changing a word or a few words while retaining the same meaning could fool the classifier into deeming it a pan. Or a sentence that was determined to be misinformation might get misclassified as accurate. This ability to fool the classifiers makes these .

People have tried various ways to find the vulnerabilities in these classifiers, Veeramachaneni says. But existing methods of finding these vulnerabilities have a hard time with this task and miss many examples that they should catch, he says.

Increasingly, companies are trying to use such evaluation tools in real time, monitoring the output of chatbots used for various purposes to try to make sure they are not putting out improper responses. For example, a bank might use a chatbot to respond to routine customer queries such as checking account balances or applying for a credit card, but it wants to ensure that its responses could never be interpreted as financial advice, which could expose the company to liability.

“Before showing the chatbot’s response to the end user, they want to use the text classifier to detect whether it’s giving financial advice or not,” Veeramachaneni says. But then it’s important to test that classifier to see how reliable its evaluations are.

“These chatbots, or summarization engines or whatnot, are being set up across the board,” he says, to deal with external customers and within an organization as well, for example providing information about HR issues. It’s important to put these text classifiers into the loop to detect things that they are not supposed to say, and filter those out before the output gets transmitted to the user.

That’s where the use of adversarial examples comes in—those sentences that have already been classified but then produce a different response when they are slightly modified while retaining the same meaning. How can people confirm that the meaning is the same? By using another large language model (LLM) that interprets and compares meanings.

So, if the LLM says the two sentences mean the same thing, but the classifier labels them differently, “that is a sentence that is adversarial—it can fool the classifier,” Veeramachaneni says. And when the researchers examined these adversarial sentences, “we found that most of the time, this was just a one-word change,” although the people using LLMs to generate these alternate sentences often didn’t realize that.

Further investigation, using LLMs to analyze many thousands of examples, showed that certain specific words had an outsized influence in changing the classifications, and therefore the testing of a classifier’s accuracy could focus on this small subset of words that seem to make the most difference. They found that one-tenth of 1% of all the 30,000 words in the system’s vocabulary could account for almost half of all these reversals of classification, in some specific applications.

Lei Xu Ph.D. ’23, a recent graduate from LIDS who performed much of the analysis as part of his thesis work, “used a lot of interesting estimation techniques to figure out what are the most powerful words that can change the overall classification, that can fool the classifier,” Veeramachaneni says.

The goal is to make it possible to do much more narrowly targeted searches, rather than combing through all possible word substitutions, thus making the computational task of generating adversarial examples much more manageable. “He’s using large language models, interestingly enough, as a way to understand the power of a single word.”

Then, also using LLMs, he searches for other words that are closely related to these powerful words, and so on, allowing for an overall ranking of words according to their influence on the outcomes. Once these adversarial sentences have been found, they can be used in turn to retrain the classifier to take them into account, increasing the robustness of the classifier against those mistakes.

Making classifiers more accurate may not sound like a big deal if it’s just a matter of classifying news articles into categories, or deciding whether reviews of anything from movies to restaurants are positive or negative. But increasingly, classifiers are being used in settings where the outcomes really do matter, whether preventing the inadvertent release of sensitive medical, financial, or security information, or helping to guide important research, such as into properties of chemical compounds or the folding of proteins for biomedical applications, or in identifying and blocking hate speech or known misinformation.

As a result of this research, the team introduced a new metric, which they call p, which provides a measure of how robust a given classifier is against single-word attacks. And because of the importance of such misclassifications, the research team has made its products available as open access for anyone to use. The package consists of two components: SP-Attack, which generates adversarial sentences to test classifiers in any particular application, and SP-Defense, which aims to improve the robustness of the classifier by generating and using adversarial sentences to retrain the model.

In some tests, where competing methods of testing classifier outputs allowed a 66% success rate by adversarial attacks, this team’s system cut that attack success rate almost in half, to 33.7%. In other applications, the improvement was as little as a 2% difference, but even that can be quite important, Veeramachaneni says, since these systems are being used for so many billions of interactions that even a small percentage can affect millions of transactions.

More information:
Lei Xu et al, Single Word Change Is All You Need: Using LLMs to Create Synthetic Training Examples for Text Classifiers, Expert Systems (2025). DOI: 10.1111/exsy.70079

This story is republished courtesy of MIT News (web.mit.edu/newsoffice/), a popular site that covers news about MIT research, innovation and teaching.

Citation:
A new way to test how well AI systems classify text (2025, August 14)
retrieved 14 August 2025
from https://techxplore.com/news/2025-08-ai-text.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.





Source link

Tech

Europe’s Online Age Verification App Is Here

Published

on

Europe’s Online Age Verification App Is Here


The European online age verification app is ready.

The app works with passports or ID cards, is built to be “completely anonymous” for the people who use it, works on any device (smartphones, tablets, and PCs), and is open source. “Best of all, online platforms can easily rely on our age verification app, so there are no more excuses,” said European Commission president Ursula von der Leyen at a press conference on Wednesday. “Europe offers a free and easy-to-use solution that can protect our children from harmful and illegal content.”

High Expectations

“It is our duty to protect our children in the online world just as we do in the offline world. And to do that effectively, we need a harmonized European approach,” von der Leyen said at Wednesday’s press conference. “And one of the central issues is the question, how can we ensure a technical solution for age verification that is valid throughout Europe? Today, I can announce that we have the answer.”

This answer takes the form of an open source app that any private company can repurpose, as long as it complies with European privacy standards and offers the same technical solution throughout the European Union. The user downloads the app, agrees to the terms and conditions, sets up a pin or biometric access, and proves their age through an electronic identification system, or by showing a passport or ID card (in which case biometric verification is also provided). The app does not store your name, date of birth, ID number, or any other personal information, according to the European Commission—only the fact that you are over a certain age.

After that, when a person using the app wants to access a social network (minimum age: 13), pornographic site (minimum age: 18), or any other age-protected content, if they are logged in from a computer, they need only scan the QR code shown on the site they want to visit. If, on the other hand, the person logs in from a smartphone, the app sends the proof of age directly. The platform does not access the document with which the user proved it in the first place.

Adoption Event

The need to introduce a common system for the entire European Union has been discussed for some time, and according to commission technicians, the technical work is now complete. Of course, it will still be possible to circumvent the system—all it takes is for an adult to lend their phone to a younger friend—but the technological architecture exists, and it will be up to EU member states to decide whether to integrate it into national digital wallets or develop independent apps.

“No More Excuses”

For the app to really be effective, platforms must be obligated to verify the age of their users—that’s where things get tricky. The Digital Services Act, which went into effect in 2024, requires “very large online platforms”—those with more than 45 million monthly users in the European Union—to take concrete steps to mitigate systemic risks related to child protection, with heavy penalties for noncompliance.

“And that’s why Europe has the DSA: to call online platforms to their responsibilities. Because Europe will not tolerate platforms making money at the expense of our children,” European Commission executive vice president Henna Virkkunen told a press conference. She added that after an investigation into TikTok, the European institutions plan to take similar action against Facebook, Instagram, and Snapchat, as well as four porn sites. “Since the platforms do not have adequate age verification tools, we developed the solution ourselves,” he concluded. In short, as von der Leyen also remarked, “there are no more excuses.”

Bare Minimum

So far, this is the European framework that sets the general rules. On this basis, member states can consider more restrictive measures. Italy was among the first to discuss how to regulate the use of social media by minors but has so far not landed on anything concrete. Elsewhere in the EU, France’s Emmanuel Macron has been a trailblazer on the issue, pushing France to discuss a rule to ban social networks for minors under the age of 15 entirely. So far, this measure has received broad political support—but the outcome depends largely on compatibility with the Digital Services Act and the availability of effective age verification systems like the app the European Commission just released.

This article originally appeared on WIRED Italia and has been translated.



Source link

Continue Reading

Tech

Anthropic Plots Major London Expansion

Published

on

Anthropic Plots Major London Expansion


Anthropic is moving into a new London office as it seeks to expand its research and commercial footprint in Europe, setting up a scrap between the leading AI labs for talent emerging from British universities.

The company, which opened its first London office in 2023, is moving to the same neighborhood as Google DeepMind, OpenAI, Meta, Wayve, Isomorphic Labs, Synthesia, and various AI research institutions.

Anthropic’s new, 158,000-square-foot office footprint will have space enough for 800 people—four times its current head count—giving it room to potentially outscale OpenAI, which itself recently announced an expansion in London.

“Europe’s largest businesses and fastest-growing startups are choosing Claude, and we’re scaling to match,” says Pip White, head of EMEA North at Anthropic. “The UK combines ambitious enterprises and institutions that understand what’s at stake with AI safety with an exceptional pool of AI talent—we want to be where all of that comes together.

UK government officials had reportedly attempted to coax Anthropic into expanding its presence in London after the company recently fell out with the US administration. Anthropic refused to allow its models to be used in mass surveillance and autonomous weapon systems, leading to an ongoing legal battle between the AI lab and the Pentagon.

As part of the expansion, Anthropic says it will deepen its work with the UK’s AI Security Institute, a government body that this week published a risk evaluation of its latest model, Claude Mythos Preview. According to Politico, the UK government is one of few across Europe to have been granted access to the model, which Anthropic has released to only select parties, citing concerns over the potential for its abuse by cybercriminals.

The increasing concentration of AI companies in the same London district is an important step in creating a pathway for research to translate into AI products, says Geraint Rees, vice-provost at University College London, whose campus is around the corner from Anthropic’s new office.

“This cluster didn’t emerge from a planning document. It grew because serious researchers and companies understand that proximity isn’t a nice-to-have,” he said last month, speaking at an event attended by WIRED. “That’s how the innovation system actually works. It’s not a clean, linear transfer from lab to market. It’s messier, richer, more human than that.”



Source link

Continue Reading

Tech

CYBERUK ’26: UK lagging on legal protections for cyber pros | Computer Weekly

Published

on

CYBERUK ’26: UK lagging on legal protections for cyber pros | Computer Weekly


The increasingly long-in-the-tooth Computer Misuse Act (CMA) of 1990 remains an albatross around the neck of British cyber security professionals, and even though the UK government committed last December to reforming it, every minute of delay is holding back the nation’s security innovation, resilience, talent, and ability to defend itself against cyber attacks, campaigners have warned.

Ahead of the National Cyber Security Centre’s (NCSC’s) upcoming CYBERUK conference in Glasgow, the CyberUp Campaign for reform of the Computer Misuse Act (CMA) has published a new report, titled Protections for Cyber Researchers: How the UK is being left behind to maintain pressure on Westminster.

The CMA defines the vague offence of unauthorised access to a computer, which the campaigners want changed because it was written 35 years ago and fails to account for the development of the cyber security profession, and the fact that in the course of their day-to-day work, cyber pros may sometimes need to hack into other systems.

“Cyber attacks are growing in scale, sophistication and severity, with a devastating impact on infrastructure, businesses and charities,” said a CyberUp campaign spokesperson.

“While other countries have moved to refresh their cyber laws in response, the UK’s Computer Misuse Act hasn’t been updated since before the modern internet – hardly the best platform for accelerating our defences into the next decade.”

The group’s report highlights how other nations, Australia, Belgium, France, Germany, Hong Kong, Malta, Portugal, and the USA, have already secured legal protections for cyber professionals that enable them to go about their business without fear of prosecution.

In Portugal – Britain’s oldest formal ally under a treaty dating back to the 14th Century – the government last year published Decreto-Lei 125/2025, implementing the European Union (EU) Network and Information Systems (NIS2) Directive and revising the country’s cyber crime law to ensure that ethical hackers and professional cyber security practitioners working in good faith are both recognised and protected.

Portgual’s laws now accept some elements of cyber work may have to happen without explicit permission or involve unanticipated technical overreach that has a legitimate purpose.

As such, Portugal says that security work undertaken in good faith won’t be punished as long as the researcher fulfills a set of conditions. For example, they can act only to find vulnerabilities and these must be reported immediately, they must avoid taking harmful actions, like conducting DDoS attacks or installing malware, and they must respect the integrity of any data they may find or access and delete it within 10 days once the issue is addressed.

CyberUp said Portugal’s example demonstrates how cyber crime laws can be modernised to legally protect research carried out in the public interest.

“Portugal has demonstrated how to modernise their equivalent law through cyber legislation. We urge the government to follow this example and act swiftly through the Cyber Security and Resilience Bill to achieve meaningful reform, or risk lagging even further behind our peers,” the spokesperson said.

Defence Framework

Working with cyber security experts and legal advisors, the CyberUp campaign has developed its own Defence Framework that would allow cyber professionals to present a statutory defence in court as long as they adhere to the Framework’s four core principles.

  • Harm Vs. Benefit: The benefits of the activity must outweigh the potential harms;
  • Proportionality: Cyber pros must take all reasonable steps to minimise the risks of their activity;
  • Intent: They must act honestly, sincerely, and clearly direct themselves towards improving security;
  • Competence: Their qualifications and professional memberships should demonstrate they are suitably equipped to perform cyber security work.

The campaigners say this framework will bring clarity and confidence to the security sector, enabling cyber pros to run essential research tasks without fear of criminal prosecution, helping organisations operate to recognised legal standards, and enabling a more open and collaborative relationship between the cyber sector and the UK government.



Source link

Continue Reading

Trending