Connect with us

Tech

A new way to test how well AI systems classify text

Published

on

A new way to test how well AI systems classify text


SP-Attack pipeline. High-flip-capacity words are used to conduct low-cost single-word adversarial attacks. Credit: Expert Systems (2025). DOI: 10.1111/exsy.70079

Is this movie review a rave or a pan? Is this news story about business or technology? Is this online chatbot conversation veering off into giving financial advice? Is this online medical information site giving out misinformation?

These kinds of automated conversations, whether they involve seeking a movie or restaurant review or getting information about your or health records, are becoming increasingly prevalent. More than ever, such evaluations are being made by highly sophisticated algorithms, known as text classifiers, rather than by human beings. But how can we tell how accurate these classifications really are?

Now, a team at MIT’s Laboratory for Information and Decision Systems (LIDS) has come up with an innovative approach to not only measure how well these classifiers are doing their job, but then go one step further and show how to make them more accurate.

The new evaluation and remediation software was developed by Kalyan Veeramachaneni, a principal research scientist at LIDS, his students Lei Xu and Sarah Alnegheimish, and two others. The is being made freely available for download by anyone who wants to use it.

The team’s results were published on July 7 in the journal Expert Systems in a paper by Xu, Veeramachaneni, and Alnegheimish of LIDS, along with Laure Berti-Equille at IRD in Marseille, France, and Alfredo Cuesta-Infante at the Universidad Rey Juan Carlos, in Spain.

A standard method for testing these classification systems is to create what are known as synthetic examples—sentences that closely resemble ones that have already been classified. For example, researchers might take a sentence that has already been tagged by a classifier program as being a rave review, and see if changing a word or a few words while retaining the same meaning could fool the classifier into deeming it a pan. Or a sentence that was determined to be misinformation might get misclassified as accurate. This ability to fool the classifiers makes these .

People have tried various ways to find the vulnerabilities in these classifiers, Veeramachaneni says. But existing methods of finding these vulnerabilities have a hard time with this task and miss many examples that they should catch, he says.

Increasingly, companies are trying to use such evaluation tools in real time, monitoring the output of chatbots used for various purposes to try to make sure they are not putting out improper responses. For example, a bank might use a chatbot to respond to routine customer queries such as checking account balances or applying for a credit card, but it wants to ensure that its responses could never be interpreted as financial advice, which could expose the company to liability.

“Before showing the chatbot’s response to the end user, they want to use the text classifier to detect whether it’s giving financial advice or not,” Veeramachaneni says. But then it’s important to test that classifier to see how reliable its evaluations are.

“These chatbots, or summarization engines or whatnot, are being set up across the board,” he says, to deal with external customers and within an organization as well, for example providing information about HR issues. It’s important to put these text classifiers into the loop to detect things that they are not supposed to say, and filter those out before the output gets transmitted to the user.

That’s where the use of adversarial examples comes in—those sentences that have already been classified but then produce a different response when they are slightly modified while retaining the same meaning. How can people confirm that the meaning is the same? By using another large language model (LLM) that interprets and compares meanings.

So, if the LLM says the two sentences mean the same thing, but the classifier labels them differently, “that is a sentence that is adversarial—it can fool the classifier,” Veeramachaneni says. And when the researchers examined these adversarial sentences, “we found that most of the time, this was just a one-word change,” although the people using LLMs to generate these alternate sentences often didn’t realize that.

Further investigation, using LLMs to analyze many thousands of examples, showed that certain specific words had an outsized influence in changing the classifications, and therefore the testing of a classifier’s accuracy could focus on this small subset of words that seem to make the most difference. They found that one-tenth of 1% of all the 30,000 words in the system’s vocabulary could account for almost half of all these reversals of classification, in some specific applications.

Lei Xu Ph.D. ’23, a recent graduate from LIDS who performed much of the analysis as part of his thesis work, “used a lot of interesting estimation techniques to figure out what are the most powerful words that can change the overall classification, that can fool the classifier,” Veeramachaneni says.

The goal is to make it possible to do much more narrowly targeted searches, rather than combing through all possible word substitutions, thus making the computational task of generating adversarial examples much more manageable. “He’s using large language models, interestingly enough, as a way to understand the power of a single word.”

Then, also using LLMs, he searches for other words that are closely related to these powerful words, and so on, allowing for an overall ranking of words according to their influence on the outcomes. Once these adversarial sentences have been found, they can be used in turn to retrain the classifier to take them into account, increasing the robustness of the classifier against those mistakes.

Making classifiers more accurate may not sound like a big deal if it’s just a matter of classifying news articles into categories, or deciding whether reviews of anything from movies to restaurants are positive or negative. But increasingly, classifiers are being used in settings where the outcomes really do matter, whether preventing the inadvertent release of sensitive medical, financial, or security information, or helping to guide important research, such as into properties of chemical compounds or the folding of proteins for biomedical applications, or in identifying and blocking hate speech or known misinformation.

As a result of this research, the team introduced a new metric, which they call p, which provides a measure of how robust a given classifier is against single-word attacks. And because of the importance of such misclassifications, the research team has made its products available as open access for anyone to use. The package consists of two components: SP-Attack, which generates adversarial sentences to test classifiers in any particular application, and SP-Defense, which aims to improve the robustness of the classifier by generating and using adversarial sentences to retrain the model.

In some tests, where competing methods of testing classifier outputs allowed a 66% success rate by adversarial attacks, this team’s system cut that attack success rate almost in half, to 33.7%. In other applications, the improvement was as little as a 2% difference, but even that can be quite important, Veeramachaneni says, since these systems are being used for so many billions of interactions that even a small percentage can affect millions of transactions.

More information:
Lei Xu et al, Single Word Change Is All You Need: Using LLMs to Create Synthetic Training Examples for Text Classifiers, Expert Systems (2025). DOI: 10.1111/exsy.70079

This story is republished courtesy of MIT News (web.mit.edu/newsoffice/), a popular site that covers news about MIT research, innovation and teaching.

Citation:
A new way to test how well AI systems classify text (2025, August 14)
retrieved 14 August 2025
from https://techxplore.com/news/2025-08-ai-text.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.





Source link

Tech

Hong Kong FWA services market set for 9.6% growth | Computer Weekly

Published

on

Hong Kong FWA services market set for 9.6% growth | Computer Weekly


Analysis from GlobalData is forecasting that fixed wireless access (FWA) service revenue in Hong Kong is expected to increase at a “healthy” compound annual growth rate (CAGR) of 9.6% between 2025 and 2030.

The latest Hong Kong Total Fixed Communications Forecast set out to quantify current and future demand and spending on mobile services for the special administrative region of China. It noted that growth was being driven by Hong Kong’s extensive 5G network coverage and could also be attributed to local operators’ efforts to expand FWA services and position it as an alternative to traditional fibre broadband services for both residential and commercial sectors, meeting growing demand for high-speed connectivity in areas where extending fibre lines is challenging.

“High-density urban and suburban centres of Hong Kong create a strong business case for FWA services due to their cost-effective and rapid deployments without the complex infrastructure and civil work required for extending fibre-optic lines to such locations,” said Neha Misra, senior analyst at GlobalData.

“Competitive, feature-rich plans from the operators will also help drive its adoption over the forecast period. For instance, HKBN’s 5G Home Broadband Plan provides unlimited 5G broadband data (subject to a 300GB with a fair-usage policy) for HKD118 per month on a 24-month contract, along with a seven-day trial guarantee. The plan also includes a waiver of the HKD28 monthly administration fee and complimentary access to the basic HomeShield security plan.”

In addition to HKBN, the study noted that operators such as 3 Hong Kong and HKT are also using their extensive 5G networks to offer home broadband services, particularly in areas with limited fibre infrastructure. It cited HKT as recently having successfully deployed mmWave-based FWA to deliver ultra-high-speed internet to rural areas and outlying islands.

“Growing demand for FWA provides operators a strong revenue opportunity by expanding home and SME broadband without the high capital intensity of fibre roll-out,” Misra added. “By leveraging nationwide 5G coverage, introducing competitively priced service plans and bundling digital home services, operators can unlock higher ARPU [average revenue per user], accelerate market penetration in underserved areas and diversify beyond traditional revenues.”

GlobalData believes the Hong Kong government’s smart city initiatives will also open new opportunities for FWA, especially 5G FWA, which can deliver high-speed internet to power applications such as the digital economy, digital governance and e-health services, while supporting the city’s dense urban environment and digital transformation goals under the Smart City Blueprint 2.0.

The original blueprint was set out in December 2017, outlining 76 initiatives under six smart areas, namely Smart Mobility, Smart Living, Smart Environment, Smart People, Smart Government and Smart Economy. Blueprint 2.0 puts forth more than 130 initiatives that continue to enhance and expand existing city management measures and services. The new initiatives aim to bring benefits and convenience to the public so that residents can better perceive the benefits of smart city innovation and technology.



Source link

Continue Reading

Tech

Prague’s City Center Sparkles, Buzzes, and Burns at the Signal Festival

Published

on

Prague’s City Center Sparkles, Buzzes, and Burns at the Signal Festival


And thanks to a mention in Dan Brown’s new novel, The Secret of Secrets, the festival has gained even more global recognition. Just a few weeks after the release of Brown’s new bestseller set in contemporary Prague, viewers were able to see for themselves what drew the popular writer to the festival, which is the largest Czech and Central European showcase of digital art. In one passage, the Signal Festival has a cameo appearance when the novel’s protagonist recalls attending an event at the 2024 edition.

“We’re happy about it,” festival director Martin Pošta says about the mention. “It’s a kind of recognition.” Not that the event needed promotion, even in one of the most anticipated novels of recent years. The organizers have yet to share the number of visitors to the festival this year, but the four-day event typically attracts half a million visitors.

On the final day, there was a long queue in front of the monumental installation Tristan’s Ascension by American video art pioneer Bill Viola before it opened for the evening, even though it was a ticketed event. In the Church of St. Salvator in the Convent of St. Agnes, visitors could watch a Christ-like figure rise upwards, streams of water defying gravity along with him, all projected on a huge screen.

The festival premiere took place on the Vltava River near the Dvořák Embankment. Taiwan’s Peppercorns Interactive Media Art presented a projection on a cloud of mist called Tzolk’in Light. While creators of other light installations have to deal with the challenges of buildings—their irregular surfaces, decorative details, and awkward cornices—projecting onto water droplets is a challenge of a different kind with artists having to give up control over the resulting image. The shape and depth of the Peppercorns’ work depended on the wind at any given moment, which determined how much of the scene was revealed to viewers and how much simply blown away. The reward, however, was an extraordinary 3D spectacle reminiscent of a hologram—something that can’t be achieved with video projections on static and flat buildings.

Another premiere event was a projection on the tower of the Old Town Hall, created for the festival by the Italian studio mammasONica. It transformed the 230-foot structure into a kaleidoscope of blue, green, red, and white surfaces. A short distance away, on Republic Square, Peppercorns had another installation. On a circular LED installation, they projected a work entitled Between Mountains and Seas, which recounted the history of Taiwan.





Source link

Continue Reading

Tech

A framework for software development self-service | Computer Weekly

Published

on

A framework for software development self-service | Computer Weekly


Software development is associated with the idea of not reinventing the wheel, which means developers often select components or software libraries with pre-built functionality, rather than write code to achieve the same result.

There are many benefits of this approach. For example, a software component that is widely deployed is likely to have undergone extensive testing and debugging. It is considered tried and trusted, mature technology, unlike brand-new code, which has not been thoroughly debugged and may inadvertently introduce unknown cyber security issues into the business.

The Lego analogy is often used to describe how these components can be put together to build enterprise applications. Developers can draw on functionality made available through application programming interfaces (APIs), which provide programmatic access to software libraries and components.

Increasingly, in the age of data-driven applications and greater use of artificial intelligence (AI), API access to data sources is another Lego brick that developers can use to create new software applications. And just as is the case with a set of old-school Lego bricks, constructing the application from the numerous software components available is left to the creativity of the software developer

A Lego template for application development

To take the Lego analogy a bit further, there are instructions, templates and pathways developers can be encouraged to follow to build enterprise software that complies with corporate policies.

A developer self-service platform provides a way for organisations to offer their developers almost pre-authorised assets, artefacts and tools that they can use to develop code
Roy Illsley, Omdia

Roy Illsley, chief analyst, IT operations, at Omdia, defines an internal developer platform (IDP) as a developer self-service portal to access the tools and environments that the IT strategy has defined the organisation should standardise on. “A developer self-service platform provides a way for organisations to offer their developers almost pre-authorised assets, artefacts and tools that they can use to develop code,” he says.

The basic idea is to provide a governance framework with a suite of compliant tools. Bola Rotibi, chief of enterprise research at CCS Insight, says: “A developer self-service platform is really about trying to get a governance path.”

Rotibi regards the platform as “a golden path”, which provides developers who are not as skilled as more experienced colleagues a way to fast-track their work within a governance structure that allows them a certain degree of flexibility and creativity.

As to why offering flexibility to developers is an important consideration falls under the umbrella of developer experience and productivity. SnapLogic effectively provides modern middleware. It is used in digital transformation projects to connect disparate systems, and is now being repositioned for the age of agentic AI.

SnapLogic’s chief technology officer, Jeremiah Stone, says quite a few of the companies it has spoken to that identify as leaders in business transformation regard a developer portal offering self-service as something that goes hand-in-hand with digital infrastructure and AI-powered initiatives.

SnapLogic’s platform offers API management and service management, which manages the lifecycle of services, version control and documentation through a developer portal called the Dev Hub.

Stone says the capabilities of this platform extend from software developers to business technologists, and now AI users, who, he says, may be looking for a Model Context Protocol (MCP) endpoint.

Such know-how captured in a self-service developer portal enables users – whether they are software developers, or business users using low-code or no-code tooling – to connect AI with existing enterprise IT systems.

Enter Backstage

One platform that seems to have captured the minds of the developer community when it comes to developer self-service is Backstage. Having begun life internally at audio streaming site Spotify, Backstage is now an open source project managed by the Cloud Native Computing Foundation (CNCF).

While many teams that implemented Backstage assumed that it would be an easy, free addition to their DevOps practices, that isn’t always the case. Backstage can be complex and requires engineering expertise to assemble, build and deploy
Christopher Condo and Lauren Alexander, Forrester

Pia Nilsson, senior director of engineering at the streaming service, says: “At Spotify, we’ve learned that enabling developer self-service begins with standardisation. Traditional centralised processes create bottlenecks, but complete decentralisation can lead to chaos. The key is finding the middle ground – standardisation through design, where automation and clear workflows replace manual oversight.”

Used by two million developers, Backstage is an open source framework for building internal developer portals. Nilsson says Backstage provides a single, consistent entry point for all development activities – tools, services, documentation and data. She says this means “developers can move quickly while staying aligned with organisational standards”.

Nilsson points out that standardising the fleet of components that comprise an enterprise technology stack is sometimes regarded as a large migration effort, moving everyone onto a single version or consolidating products into one. However, she says: “While that’s a critical part of standardising the fleet, it’s even more important to figure out the intrinsic motivator for the organisation to keep it streamlined and learn to ‘self-heal’ tech fragmentation.”

According to Nilsson, this is why it is important to integrate all in-house-built tools, as well as all the developer tools the business has purchased, in the same IDP. Doing so, she notes, makes it very easy to spot duplication. “Engineers will only use what they enjoy using, and we usually enjoy using the stuff we built ourselves because it’s exactly what we need,” she says.

The fact that Backstage is a framework is something IT leaders need to consider. In a recent blog post, Forrester analysts Christopher Condo and Lauren Alexander warned that most IDPs are frameworks that require assembly: “While many teams that implemented Backstage assumed that it would be an easy, free addition to their DevOps practices, that isn’t always the case. Backstage can be complex and requires engineering expertise to assemble, build and deploy.”

However, Forrester also notes that commercial IDP options are now available that include an orchestration layer on top of Backstage. These offer another option that may be a better fit for some organisations.

AI in an IDP

As well as the assembly organisations will need to carry out if they do not buy a commercial IDP, AI is revolutionising software development, and its impact needs to be taken into account in any decisions made around developer self-service and IDP.

Spotify’s Nilsson believes it is important for IT leaders to figure out how to support AI tooling usage in the most impactful way for their company.

“Today, there is both a risk to not leveraging enough AI tools or having it very unevenly spread across the company, as well as the risk that some teams give in to the vibes and release low-quality code to production,” she says.

According to Nilsson, this is why the IT team responsible for the IDP needs to drive up the adoption of these tools and evaluate the impact over time. “At Spotify, we drive broad AI adoption through education and hack weeks, which we promote through our product Skill Exchange. We also help engineers use context-aware agentic tools,” she adds.

Looking ahead

In terms of AI tooling, an example of how developer self-service could evolve is the direction of travel SAP looks to be taking with its Joule AI copilot tool.

When structure, automation and visibility are built into the developer experience, you replace bottlenecks with flow and create an environment where teams can innovate quickly, confidently and responsibly
Pia Nilsson, Spotify

CCS Insights’ Rotibi believes the trend to integrate AI into developer tools and platforms is an area of opportunity for developer self-service platforms. Among the interesting topics Rotibi saw at the recent SAP TechEd conference in Berlin was the use of AI in SAP Joule.

SAP announced new AI assistants in Joule, which it said are able to coordinate multiple agents across workflows, departments and applications. According to SAP, these assistants plan, initiate and complete complex tasks spanning finance, supply chain, HR and beyond. 

“SAP Joule is an AI interface. It’s a bit more than just a chatbot. It is also a workbench,” says Rotibi. Given that Joule has access to the SAP product suite, she notes that, as well as providing access, Joule understands the products. “It knows all the features and functions SAP has worked on, and, behind the scenes, uses the best data model to get the data points the user wants,” she says.

Recognising that enterprise software developers will want to build their own applications and create their own integration between different pieces of software, she says SAP Joule effectively plays the role of a developer self-service portal for the SAP product suite.

Besides what comes next with AI-powered functionality, there are numerous benefits in offering developer self-service to improve the overall developer experience, but there needs to be structure and standards.

Nilsson says: “When structure, automation and visibility are built into the developer experience, you replace bottlenecks with flow and create an environment where teams can innovate quickly, confidently and responsibly.”



Source link

Continue Reading

Trending