Tech
Large language models provide unreliable answers about public services, Open Data Institute finds | Computer Weekly
Popular large language models (LLMs) are unable to provide reliable information about key public services such as health, taxes and benefits, the Open Data Institute (ODI) has found.
Drawing on more than 22,000 LLM prompts designed to reflect the kind of questions people would ask artificial intelligence (AI)-powered chatbots, such as, “How do I apply for universal credit?”, the data raises concerns about whether chatbots can be trusted to give accurate information about government services.
The publication of the research follows the UK government’s announcement of partnerships with Meta and Anthropic at the end of January 2026 to develop AI-powered assistants for navigating public services.
“If language models are to be used safely in citizen-facing services, we need to understand where the technology can be trusted and where it cannot,” said Elena Simperl, the ODI’s director of research.
Responses from models – including Anthropic’s Claude-4.5-Haiku, Google’s Gemini-3-Flash and OpenAI’s ChatGPT-4o – were compared directly with official government sources.
The results showed many correct answers, but also a significant variation in quality, particularly for specialised or less-common queries.
They also showed that chatbots rarely admitted when they didn’t know the answer to a question, and attempted to answer every query even when its responses were incomplete or wrong.
Burying key facts
Chatbots also often provided lengthy responses that buried key facts or extended beyond the information available on government websites, increasing the risk of inaccuracy.
Meta’s Llama 3.1 8B stated that a court order is essential to add an ex-partner’s name to a child’s birth certificate. If followed, this advice would lead to unnecessary stress and financial cost.
ChatGPT-OSS-20B incorrectly advised that a person caring for a child whose parents have died is only eligible for Guardian’s Allowance if they are the guardian of a child who has died.
It also incorrectly stated that the applicant was ineligible if they received other benefits for the child.
Simperl said that for citizens, the research highlights the importance of AI literacy, while for those designing public services, “it suggests caution in rushing towards large or expensive models, which emphasise the need for vendor lock-in, given how quickly the technology is developing. We also need more independent benchmarks, more public testing, and more research into how to make these systems produce precise and reliable answers.”
The second International AI safety report, published on 3 February, made similar findings regarding the reliability of AI-powered systems. Noting that while there have been improvements in recalling factual information since the 2025 safety report, “even leading models continue to give confident but incorrect answers at significant rates”.
Following incorrect advice
It also found highlighted users’ propensity to follow incorrect advice from automated systems generally, including chatbots, “because they overlook cues signalling errors or because they perceive the automation system as superior to their own judgement”.
The ODI’s research also challenges the idea that larger, more resource-intensive models are always a better fit for the public sector, with smaller models delivering comparable results at a lower cost than large, closed-source models such as ChatGPT in many cases.
Simperl warns governments should avoid locking themselves into long-term contracts when models temporarily outperform one another on price or benchmarks.
Commenting on the ODI’s research during a launch event, Andrew Dudfield, head of AI at Full Fact, highlighted that because the government’s position is pro-innovation, regulation is currently framed around principles rather than detailed rules.
“The UK may be adopting AI faster than it is learning how to use it, particularly when it comes to accountability,” he said.
Trustworthiness
Dudfield noted that what makes this work compelling is that it focuses on real user needs, but that trustworthiness needs to be evaluated from the perspective of the person relying on the information, not from the perspective of demonstrating technical capability.
“The real risk is not only hallucination, but the extent to which people trust plausible-sounding responses,” she said.
Asked at the same event if the government should be building its own systems or relying on commercial tools, Richard Pope, researcher at the Bennett School of Public Policy, said the government needs “to be cautious about dependency and sovereignty”.
“AI projects should start small, grow gradually and share what they are learning,” he said, adding that public sector projects should prioritise learning and openness rather than rapid expansion.
Simperl highlighted that AI creates the potential to tailor information for different languages or levels of understanding, but that those opportunities “need to be shaped rather than left to develop without guidance”.
With new AI models launching every week, a January 2026 Gartner study found that the increasingly large volume of unverified and low-quality data generated by AI systems was a clear and present threat to the reliability of LLMs.
Large language models are trained on scraped data from the web, books, research papers and code repositories. While many of these sources already contain AI-generated data, at the current rate of expansion, they may all be populated with it.
Highlighting how future LLMs will be trained more and more with outputs from current ones as the volume of AI-generated data grows, Gartner said there is a risk of models collapsing entirely under the accumulated weight of their own hallucinations and inaccurate realities.
Managing vice-president Wan Fui Chan said that organisations could no longer implicitly trust data, or assume it was even generated by a human.
Chan added that as AI-generated data becomes more prevalent, regulatory requirements for verifying “AI-free” data will intensify in many regions.
Tech
Our Favorite TV Is Still Almost Half Off
Last week I covered the best TV deals you could find ahead of Bad Bunny’s halftime show, and one of them, the TCL QM6K, has somehow remained at this impressively discounted price. You can grab the 65-inch QM6K from Best Buy for just $530, just $30 from a 50% markdown, with discounts on the larger versions of the screen as well.
While there are certainly fancier models with higher brightness and prices tags to match, we think the QM6K offers a balanced image at a great price point. It has incredible black levels that help create a deep, high-contrast picture. The panel is impressively uniform too, with almost no blooming across the entire screen. It maintains great details in darker areas, even with the preset on Dolby Vision Dark. Colors are natural and vivid, adding to the realism and immersion of the experience.
It’s great for gaming too, with a refresh rate that you can push all the way to 144Hz, and you’ll get VRR (Variable Refresh Rate) and ALLM (Auto Low Latency Mode) on two of the four HDMI ports. Speaking of ports, TCL includes a separate port just for eARC, which means you can save those important HDMI 2.1 ports for your gaming consoles. TCL even advertises a new “Zero-Delay Transient Response” mode that will make sure your games are all perfectly lag free.
It also uses Google TV, which is one of the easier to use and snappier smart interfaces available. It even has hands-free Google assistant, if you often have trouble finding the remote. In addition to the full suite of Google TV apps, it’s compatible with plain old Chromecast, and Apple AirPlay.
The most reasonable 65-inch version of the screen is currently marked down to $530 at Best Buy, but if you’re dreaming a little bigger, there’s a similar $500 discount on the 75-inch version that brings the price down to just $800. You can also save $900 on the downright gigantic 85-inch model, but if you’re considering going that large, you might check out some of our other favorite TVs first for more premium options.
Tech
Elon Musk’s X Appears to Be Violating US Sanctions by Selling Premium Accounts to Iranian Leaders
In recent weeks, Elon Musk has followed president Donald Trump’s lead, slamming Iranian government officials and supporting the thousands of protesters railing against the regime. He even provided free access to his Starlink satellites in the midst of a nationwide internet blackout.
But while publicly proclaiming his support of the protesters, Musk’s company X appears to be profiting from the very same government officials he railed against, potentially violating US sanctions in the process, according to a new report from the Tech Transparency Project (TTP) shared exclusively with WIRED.
TTP identified more than two dozen X accounts allegedly run by Iranian government officials, state agencies, and state-run news outlets, which display a blue check mark, indicating they have access to X’s premium service. These accounts were sharing state-sponsored propaganda at a time when ordinary Iranians had no access to the internet, and their messages appeared to be artificially boosted to increase reach and engagement, which is a key aspect of X’s premium service. An X Premium subscription, which is the only way to receive a blue check mark, costs $8 a month, while a Premium+ subscription, which removes ads and boosts reach even further, costs $40 a month.
At a time when the Trump administration is threatening Iran with possible military action if it does not meet demands related to nuclear enrichment and ballistic missiles, X appears to be undermining those efforts by providing a social media bullhorn for the Iranian government to spread its message.
“The fact that Elon Musk is not just platforming these individuals, but taking their money to boost their content through these premium subscriptions and give them extra features also means he’s undermining the sanctions that the US and the Trump administration are actually applying,” Katie Paul, the director of the TTP, tells WIRED.
X did not respond to a request for comment, but within hours of WIRED flagging several X accounts belonging to Iranian officials, their blue check marks were removed. The rest of the accounts identified by TTP but not shared with X continue to display a blue check mark.
The White House directed WIRED to the Treasury when asked for comment. A Treasury spokesperson said they do not comment on specific allegations but that it “take[s] allegations of sanctionable conduct extremely seriously.”
Protests broke out in the Iranian capital of Tehran on December 28 over the continuing devaluation of the Iranian rial against the dollar and a widespread economic crisis in the country. Over the following days, tens of thousands of protesters poured onto the streets in cities across the country, calling for regime change and the end of Supreme Leader Ayatollah Ali Khamenei’s 37-year reign.
In response, the regime brutally cracked down on protesters, arresting tens of thousands of people and killing thousands more. The true death toll is still unknown but could be much higher than currently reported.
Trump signaled his support for the protesters in a post on Truth Social on January 2, promising to come to their rescue. “We are locked and loaded and ready to go,” he wrote. Musk quickly followed Trump, calling Khamenei “delusional.”
On January 5, Gholamhossein Mohseni-Ejei, the head of Iran’s judiciary, who had a blue check mark at the time, wrote in a post on X, “This time, we will show no mercy to the rioters.” Ejei was among the accounts whose blue check marks were removed on Wednesday after WIRED contacted the company.
A few days later, X changed the Iranian flag emoji on the platform to one used before the 1979 revolution, featuring a lion and sun. On January 14, Musk announced that anyone with a Starlink device would be free to access the internet in Iran without a subscription. At the time, Starlink devices were the only viable way of getting online after the government imposed a near-total internet blackout.
Tech
I Used TurboTax’s Mobile App to File My Taxes for Free
I’ve used TurboTax to file my taxes for several years. It’s the most popular DIY tax service, and also often the cheapest and arguably most straightforward. TurboTax has the filer in mind by utilizing an easy-to-use interface, offering available expert help, with different options for document auto-upload; helpful tips and information regarding tax requirements; and transparent, low-cost options for every type of filer.
The service makes it super easy for returning users by storing previous years’ information, allowing easy auto-upload, and remembering choices and previously used forms from years past. Doing my taxes as a returning user with TurboTax takes a fraction of the time of other tax services I have tested. (Need a jumping off point? I’ve got a guide on how to file your taxes online for extra help.)
Yes, You Can Actually File for Free
If you haven’t tried TurboTax, this is the best time to see if it’s the right fit for you (and be able to file for free). You can file both state and federal taxes for $0 right now. There are only a few requirements for this awesome free filing deal. You must not have filed with TurboTax before (and are switching from another provider), and you must file in the TurboTax mobile app by February 28. You’ll need to both start and file within the mobile app; this is only eligible on DIY (self-guided) tax services and excludes expert assist products. This means that it applies to Simple Form 1040 returns only (meaning no schedules, except for EITC, CTC, student loan interest, and Schedule 1-A forms are eligible).
One of the downsides to TurboTax is that while it’s (in my opinion) the easiest-to-use interface with seamless auto-upload features, it can be a bit more expensive than similar competitors. I’ve used FreeTaxUSA in the past, when my income was lower and my taxes were simpler. The service is very similar in design to TurboTax, and while it is still a low-cost option, it’s not completely 100 percent free, as it charge $16 for filing a state return. Plus, when I tested the service last year, FreeTaxUSA gave me the highest amount of taxes owed from all services I tested.
TurboTax filed more than twice the number of free returns as FreeTaxUSA last year (based on the total number of federal and state returns filed in Tax Year 2024). And this tax season, more than 100 million people in the US are eligible for free filing with TurboTax. If you file your own federal and state returns using DIY TurboTax products, filing will be free if you use the mobile app until February 28.
Filing in Your Hands
Filing taxes can be confusing and potentially expensive. While I urge anyone who hasn’t filed with TurboTax to take advantage of the free federal and state filing deal through the mobile app, there are several options if you have filed with the service before or have more complicated returns that may require additional assistance.
There are three options for filers: DIY, where you file yourself with step-by-step instructions (the previously mentioned service eligible for the free filing deal); Expert Assist, where you get help from tax experts throughout the process and have the expert review it before submitting; or you can also get your taxes done completely by a local tax expert with Expert Full Service. Prices vary based on the chosen tier and when you file (the earlier, the cheaper, especially if you’re able to file before March).
The filing process starts out with a helpful questionnaire so that the program knows which sections are applicable to you, like dependents, assets, and education, so you’re not slogging through things that aren’t relevant. At the beginning, TurboTax also estimated the time it’d take to finish and asked how I filed last year—no other service I previously tested did either, which was helpful in estimating how long the process would take.
-
Entertainment6 days agoHow a factory error in China created a viral “crying horse” Lunar New Year trend
-
Business2 days agoAye Finance IPO Day 2: GMP Remains Zero; Apply Or Not? Check Price, GMP, Financials, Recommendations
-
Tech6 days agoNew York Is the Latest State to Consider a Data Center Pause
-
Tech6 days agoPrivate LTE/5G networks reached 6,500 deployments in 2025 | Computer Weekly
-
Fashion1 week agoICE cotton slides as strong dollar, metal sell-off hit prices
-
Business6 days agoStock market today: Here are the top gainers and losers on NSE, BSE on February 6 – check list – The Times of India
-
Tech6 days agoNordProtect Makes ID Theft Protection a Little Easier—if You Trust That It Works
-
Business6 days agoMandelson’s lobbying firm cuts all ties with disgraced peer amid Epstein fallout
