Tech
AI systems are great at tests. But how do they perform in real life?

Earlier this month, when OpenAI released its latest flagship artificial intelligence (AI) system, GPT-5, the company said it was “much smarter across the board” than earlier models. Backing up the claim were high scores on a range of benchmark tests assessing domains such as software coding, mathematics and health care.
Benchmark tests like these have become the standard way we assess AI systems—but they don’t tell us much about the actual performance and effects of these systems in the real world.
What would be a better way to measure AI models? A group of AI researchers and metrologists—experts in the science of measurement—recently outlined a way forward.
Metrology is important here because we need ways of not only ensuring the reliability of the AI systems we may increasingly depend upon, but also some measure of their broader economic, cultural, and societal impact.
Measuring safety
We count on metrology to ensure the tools, products, services, and processes we use are reliable.
Take something close to my heart as a biomedical ethicist—health AI. In health care, AI promises to improve diagnoses and patient monitoring, make medicine more personalized and help prevent diseases, as well as handle some administrative tasks.
These promises will only be realized if we can be sure health AI is safe and effective, and that means finding reliable ways to measure it.
We already have well-established systems for measuring the safety and effectiveness of drugs and medical devices, for example. But this is not yet the case for AI—not in health care, or in other domains such as education, employment, law enforcement, insurance, and biometrics.
Test results and real effects
At present, most evaluation of state-of-the-art AI systems relies on benchmarks. These are tests that aim to assess AI systems based on their outputs.
They might answer questions about how often a system’s responses are accurate or relevant, or how they compare to responses from a human expert.
There are literally hundreds of AI benchmarks, covering a wide range of knowledge domains.
However, benchmark performance tells us little about the effect these models will have in real-world settings. For this, we need to consider the context in which a system is deployed.
The problem with benchmarks
Benchmarks have become very important to commercial AI developers to show off product performance and attract funding.
For example, in April this year a young startup called Cognition AI posted impressive results on a software engineering benchmark. Soon after, the company raised US$175 million (A$270 million) in funding in a deal that valued it at US$2 billion (A$3.1 billion).
Benchmarks have also been gamed. Meta seems to have adjusted some versions of its Llama-4 model to optimize its score on a prominent chatbot-ranking site. After OpenAI’s o3 model scored highly on the FrontierMath benchmark, it came out that the company had had access to the dataset behind the benchmark, raising questions about the result.
The overall risk here is known as Goodhart’s law, after British economist Charles Goodhart: “When a measure becomes a target, it ceases to be a good measure.”
In the words of Rumman Chowdhury, who has helped shape the development of the field of algorithmic ethics, placing too much importance on metrics can lead to “manipulation, gaming, and a myopic focus on short-term qualities and inadequate consideration of long-term consequences”.
Beyond benchmarks
So if not benchmarks, then what? Let’s return to the example of health AI. The first benchmarks for evaluating the usefulness of large language models (LLMs) in health care made use of medical licensing exams. These are used to assess the competence and safety of doctors before they’re allowed to practice in particular jurisdictions.
State-of-the-art models now achieve near-perfect scores on such benchmarks. However, these have been widely criticized for not adequately reflecting the complexity and diversity of real-world clinical practice.
In response, a new generation of “holistic” frameworks have been developed to evaluate these models across more diverse and realistic tasks. For health applications, the most sophisticated is the MedHELM evaluation framework, which includes 35 benchmarks across five categories of clinical tasks, from decision-making and note-taking to communication and research.
What better testing would look like
More holistic evaluation frameworks such as MedHELM aim to avoid these pitfalls. They have been designed to reflect the actual demands of a particular field of practice.
However, these frameworks still fall short of accounting for the ways humans interact with AI system in the real world. And they don’t even begin to come to terms with their impacts on the broader economic, cultural, and societal contexts in which they operate.
For this we will need a whole new evaluation ecosystem. It will need to draw on expertise from academia, industry, and civil society with the aim of developing rigorous and reproducible ways to evaluate AI systems.
Work on this has already begun. There are methods for evaluating the real-world impact of AI systems in the contexts in which they’re deployed—things like red-teaming (where testers deliberately try to produce unwanted outputs from the system) and field testing (where a system is tested in real-world environments). The next step is to refine and systematize these methods, so that what actually counts can be reliably measured.
If AI delivers even a fraction of the transformation it’s hyped to bring, we need a measurement science that safeguards the interests of all of us, not just the tech elite.
More information:
Reva Schwartz et al, Reality Check: A New Evaluation Ecosystem Is Necessary to Understand AI’s Real World Effects, arXiv (2025). DOI: 10.48550/arxiv.2505.18893
This article is republished from The Conversation under a Creative Commons license. Read the original article.
Citation:
AI systems are great at tests. But how do they perform in real life? (2025, August 25)
retrieved 25 August 2025
from https://techxplore.com/news/2025-08-ai-great-real-life.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.
Tech
Your Cat Probably Isn’t Drinking Enough Water. A Fountain Can Help.

Compare Our Picks
Others We Tested
Courtesy of Petkik
Petkit Eversweet Max for $90: This techy automatic fountain can be either cordless or battery-powered (lasting up to 83 days), and the drinking bowl is made of stainless steel, but the reservoir is plastic. Because of the shape of the basin with the chunky battery and reservoir bowl, it’s a little awkward to clean. The app logs every time a pet drinks and compares it over time to determine whether your cat’s drinking habits have changed. The app also keeps track of when the filter needs replacing and when you last added water. However, it doesn’t monitor or show you how much water is left in the basin; you have to check manually. The design also made it a bit difficult to clean and refill easily.
Enabot Rola Smart Pet Water Fountain for $50: This automatic fountain is cordless and runs on a rechargeable battery that lasts up to 60 days (although it can stay plugged in too). It has a wireless pump that uses magnetic induction—this pump was one of the easiest and most hassle-free to clean of all I tested. The fountain has a stainless steel top that holds a decent amount of water even when not running. Although the tank is plastic, and I’m wary of plastic now because of its propensity to harbor bacteria (plus it doesn’t keep water as cold). The app gives reminders of when the water’s low, the fountain needs cleaning, or the filter needs replacing, plus it automatically stops dispensing water and sends you a reminder to refill via app. It also logs the number of times your pet drank and for how long, monitoring hydration patterns over time and comparing the stats to average time used. A complaint I had is that this fountain wouldn’t stay on Continuous stream mode, even when plugged in, instead automatically switching to the Sensing stream.
Homerunpet Wireless Pet Fountain for $60: This cordless fountain can be used as a traditional fountain plugged in on its base, or can be detached and moved around the house with 30 days of battery life. I don’t love that this fountain is all plastic, but it’s easy to see water levels from the outside, the top and filter layers are super easy to remove, clean, and replace, and the wireless (basically silent) pump makes it a whole lot easier to clean. Plastic doesn’t keep the water as cold or clean as stainless steel, so you’ll have to clean it a lot more often. The fountain only begins bubbling when a cat (or human) approaches to save battery power, and there’s no option to control the flow (and no connected app). I like the wireless pump, but I’m really over plastic at this point.
Photograph: Kat Merck
Wonder Creature Cat Fountain for $20: My two cats have cycled through several water fountains over the past few years, but this no-nonsense version has been a stalwart. The inside is lit by a blue LED (bright enough to glow in the dark), and a clear viewing window on the side makes it easy to monitor the water level from afar. I also like the dishwasher-safe metal bowl and the fact you can remove parts of the yellow and white “flower” to create a fountain configuration your cat likes (waterfall, low bubble-up, tall bubble-up). The only major downsides are the fact it requires very frequent cleaning and filter changes due to the plastic body, and that there is no reservoir to hold water in case of power outage or pump malfunction. When I go on vacation I have to swap it out for an old-school gravity dispenser. —Kat Merck
Happy & Polly Gothic Cat Drinking Fountain for $60: If you prefer gothic decor to neutral blandness, this ghostly ceramic cat fountain from Happy & Polly may tempt you to bite. The water bubbles up out of the top of the ghost and pools on the ceramic top. It’s fairly quiet at around 35 decibels, but it gets loud when the water is running low, and I worry about the motor burning out, as the 1.5-liter capacity can run dry fast. While the ceramic finish is easy to clean, it is fiddly to take apart. You will want to clean it once a week to prevent it from becoming slimy, and you must change the filter once a month. —Simon Hill
Petkit Eversweet Solo 2 for $45: I love three key features of this fountain: The bowl sits on top of a wireless charging base, so you don’t have to fiddle with cables, it is super easy to clean, and it’s very quiet at around 25 decibels. A flashing light warns you when the water is running low, and you can check when the filter needs to be changed in the app. There’s an optional smart mode that pumps intermittently and a night mode to turn the light off. Pleasingly, all three of our cats drink from this fountain, though that does mean I have to refill it often, as it only holds 2 liters. —Simon Hill
Photograph: Simon Hill
Oneisall Stainless Steel Pet Fountain for $50: This drinking fountain is about as simple as they come. As it’s designed for cats or small dogs, it has a large bowl, but some cats will prefer that. I love the mostly stainless steel construction, as it’s easy to keep clean and less prone to dirt and bacteria buildup. You can even stick parts into the dishwasher to clean. This fountain can also hold up to 7 liters of water, so you don’t have to refill as often. It’s fairly quiet at around 35 decibels, but it gets louder when the water is running low (a red light warns you when it needs a refill). You should clean once a week and rinse the filter. The filter packs are relatively affordable at $15 for a pack of eight, and you need to swap them once a month. —Simon Hill
Petlipo Cordless Cat Water Fountain for $57: This all-plastic pet fountain sits on dock for easier tank cleaning and is rechargeable for up to 60 days of cordless power, has a wire-free pump, a large 2.6 liter capacity, and three customizable water flow modes (induction, timer, and continuous flow). The heavy duty filter is encased in a plastic cage and only needs to be replaced every 4 to 5 weeks. I had no issues while using this fountain, but at nearly $60, thats egregiously expensive for being made out of cheap (and bacteria-harboring) plastic. Although it’s a solid fountain, I’d spend less and grab one of stainless steel picks.
Not Recommended
Photograph: Molly Higgins
Petcube Ceramic Pet Water Fountain for $90: I really wanted to love this fountain; although its basin is plastic, it had a ceramic top, which is more hygienic than plastic (and I had never tested a ceramic model before), and the brand makes some of my favorite pet cameras. However, setup was a bit confusing, it took a long time to get the base charged to power the fountain’s water flow, and the sensor to begin water flow is only triggered from one side, making placement awkward. After a few days, it would only run while plugged in, soon its stream was barely strong enough to reach the top, and after just over a week it stopped working altogether. Also, it’s egregiously expensive for a pet fountain.
Photograph: Molly Higgins
Cat Mate 3-Level Pet Fountain for $28: This tall automatic Cat Mate fountain sets itself apart with three tiers for cats who like to drink at every level. Cleaning the motor requires disassembly using tools and extended soaking. Because of the long distance the water has to travel, evaporation caused the water to need to be refilled about every other day. Plastic also harbors bacteria, and previous plastic models I’ve owned have had mold issues. The basin is quite large and sits flat. Because of this, some debris would sit in the bottom and front of the large basin rather than moving back to the filter system behind. The plastic material and lack of ergonomic gravity design caused this fountain to be dirtier than others.
Whisker City Free Fall Cat Fountain for $30: This huge fountain is better suited for dogs—with a large 150-fluid-ounce bowl and a waterfall design. Although the basin has a small splash pad to help offset the waterfall noise, this was one of the loudest fountains I tested. The evaporation from the waterfall-like system also caused me to refill it every other day. Because of the structure of the fountain, my cats had to bend their head at an awkward angle. They tended to avoid drinking from the basin because of that, and their heads got slightly wet from the splatter of the waterfall. The basin is also not angled so crumbs and debris sit at the bottom of the bowl.
Petkit EverSweet Solo SE for $26: This very simple, straightforward fountain has a square-shaped body, is translucent to easily see water levels from he outside, and has a nearly silent 25-decibel cordless pump to circulate water from the basin to the top level, where 60 milliliters of water is always available for drinking, even in case of power failure. The basin sits on a base and all parts easily detach, making it easier to clean. This fountain doesn’t have multiple modes or an associated app—you’ll have to check water levels manually. I noticed this fountain wasn’t as cold as some of the others, and because of the design of the top, debris often pooled in the dipped areas, which made me clean it often.
I used each of these for a week as my cats’ main source of water. As mentioned, I noted the ease of setting up, evaluated parts and filters, and generally compared the various types of water fountains—spigot, bubbling, or waterfall. Some flows were continuous and some were intermittent (my cats didn’t prefer intermittent). Cats may also be intrigued and want to play with the machine rather than drink, so be sure to give them time and keep another water source around until they are fully adjusted to the new gadget.
Cats sometimes struggle to consume enough water, which can lead to potentially lethal UTIs and blockages in male cats especially. This is one of the reasons vets are moving more toward encouraging owners to give their cat at least a partially wet food diet, as this helps them consume more moisture, especially since cats don’t naturally consume as much water as dogs. Unlike dogs, cats are generally quite particular in their likes and dislikes, and cats can see stagnant water as potentially harmful. (If the cat was in the wild, stagnant water has more potential for harmful bacteria). Cats are more drawn to moving water in nature, and these fountains help encourage them to drink more by emulating what they’re naturally drawn to.
While automatic water fountains are better for your cats’ overall water consumption, they do require a bit more work and money. Rather than refilling a bowl, these take a little more elbow grease—but it’s worth it for your cat’s health. Along with routine refilling and cleaning, you’ll need to disassemble the fountain to clean all parts, including using a brush for the bowl and tubes. You may also have to disassemble the motor to deep-clean because of mineral buildup. These also have different types of filtration cartridges in specific shapes for the brand’s fountains, which require you to buy and change out filters, usually monthly but sometimes more often.
Let’s be honest, a lot of these fountains are pretty much the same. I looked especially for the overall design—I am a fiend for stainless steel because of the potential of porous plastic harboring harmful bacteria. I also favor a wide reservoir without high sides to help reduce the chance for whisker fatigue. I prefer fountains that have a small basin reservoir of water available at all times, in case of low water levels or power failure. I took into account ease of setting up, refilling, and cleaning, as well as overall design. And of course, there were some that my cats took to straight away, and some they didn’t seem to favor as much.
After prolonged testing, I now look for these three things and encourage you too as well: a cordless pump for easier (and safer) cleaning, constructed from stainless steel so it’s more hygienic, and a window to monitor water levels (especially if it’s not connected to an app).
Power up with unlimited access to WIRED. Get best-in-class reporting and exclusive subscriber content that’s too important to ignore. Subscribe Today.
Tech
Want to Start a Website? These Are the Best Website Builders

Top Website Builders
Publishing a website is still more complicated than it has any right to be, but the best website builders streamline the process. Instead of juggling a bunch of files on a server and learning the ins and outs of networking, website builders do exactly what’s written on the tin. Piece by piece, using a drag-and-drop interface, you can design your website the way you want with immediate feedback, rather than spending time buried in code and hoping it comes out on the other end.
There are dozens of website builders, and most of them range from decent to straight-up bad. Any web host with a bit of ambition has a website builder floating around, even if it’s slow, clunky, and lacking features. I focused on finding the best tools for building your website that go beyond just an add-on, and these are my favorites. If you’re after something simpler than a full-blown website, check out our list of the Best Portfolio Websites.
Table of Contents
Best Website Builder for Most
You’ve heard of Squarespace over and over again, I’m sure, and that’s not an accident. It’s an inviting website builder that made a name for itself with bold, striking templates. Beneath the veneer of attractive, but seemingly simple, websites, you’ll find one of the most capable website builders on the market. That balance of power and usability is what sets Squarespace apart.
It feels like a creative tool. Where other website builders lag and stutter to get a new element on your page, Squarespace feels fluid. Your dashboard gives you quick access to edit your site, and around every corner, Squarespace feels designed so you never have to look up a tutorial. I started a simple photography website, and within an hour, I had a custom course page set up, an appointment schedule with automated confirmation emails, and services (with pricing and the ability to accept payments) configured.
Squarespace isn’t cheap, but it also doesn’t meddle in restrictive, low-cost plans. Even on the Basic plan, you have access to ecommerce tools and space for multiple contributors.
Squarespace Pricing and Plans
Best Cheap Website Builder
Hostinger is better known as a web hosting provider, but it has a surprisingly robust website builder that you can use on its own or for free as part of a hosting package. You don’t get the same world-class template design and dense feature-set of a more expensive builder like Squarespace, but that’s OK. Hostinger’s website builder will run you just a few bucks a month, and based on my testing, it feels heavily angled toward newcomers.
You sacrifice some power for convenience, but there’s an awful lot you can accomplish with Hostinger. Integrations with PayPal, Stripe, and Square allow you to quickly set up e-commerce. Add-ons with WhatsApp give you live chat capabilities, and Printful support means you can sell print-on-demand merchandise. And, if you outgrow the website builder, Hostinger allows you to export your website’s content to WordPress.
Where Hostinger wins for me is through its AI tools. Just about every website builder these days has AI integrated in some way, but it’s around every corner at Hostinger. You need to pay extra for some of these AI features—the logo generator, for example, requires credits—but they give you a great starting point for mocking up the look, feel, and tone of your website.
Hostinger Pricing and Plans
Best for Small Businesses
Wix is undoubtedly the biggest competitor to Squarespace, and I had a hard time putting one above the other. Ultimately, Wix ended up in the backseat due to higher prices and a slightly less intuitive interface. That’s partly because of how powerful Wix is. Rather than corral you in an elegant (if restrictive) website-building workflow, Wix gives you a ton of options.
First, templates. You get a few hundred elsewhere, but Wix offers over 2,000 templates. At the time of writing, there are 223 pages of them on Wix’s website. They aren’t all winners, but I was able to mock up a quick photography portfolio website within a few minutes by browsing the templates and uploading a few photos.
Tech
A New Algorithm Makes It Faster to Find the Shortest Paths

The original version of this story appeared in Quanta Magazine.
If you want to solve a tricky problem, it often helps to get organized. You might, for example, break the problem into pieces and tackle the easiest pieces first. But this kind of sorting has a cost. You may end up spending too much time putting the pieces in order.
This dilemma is especially relevant to one of the most iconic problems in computer science: finding the shortest path from a specific starting point in a network to every other point. It’s like a souped-up version of a problem you need to solve each time you move: learning the best route from your new home to work, the gym, and the supermarket.
“Shortest paths is a beautiful problem that anyone in the world can relate to,” said Mikkel Thorup, a computer scientist at the University of Copenhagen.
Intuitively, it should be easiest to find the shortest path to nearby destinations. So if you want to design the fastest possible algorithm for the shortest-paths problem, it seems reasonable to start by finding the closest point, then the next-closest, and so on. But to do that, you need to repeatedly figure out which point is closest. You’ll sort the points by distance as you go. There’s a fundamental speed limit for any algorithm that follows this approach: You can’t go any faster than the time it takes to sort.
Forty years ago, researchers designing shortest-paths algorithms ran up against this “sorting barrier.” Now, a team of researchers has devised a new algorithm that breaks it. It doesn’t sort, and it runs faster than any algorithm that does.
“The authors were audacious in thinking they could break this barrier,” said Robert Tarjan, a computer scientist at Princeton University. “It’s an amazing result.”
The Frontier of Knowledge
To analyze the shortest-paths problem mathematically, researchers use the language of graphs—networks of points, or nodes, connected by lines. Each link between nodes is labeled with a number called its weight, which can represent the length of that segment or the time needed to traverse it. There are usually many routes between any two nodes, and the shortest is the one whose weights add up to the smallest number. Given a graph and a specific “source” node, an algorithm’s goal is to find the shortest path to every other node.
The most famous shortest-paths algorithm, devised by the pioneering computer scientist Edsger Dijkstra in 1956, starts at the source and works outward step by step. It’s an effective approach, because knowing the shortest path to nearby nodes can help you find the shortest paths to more distant ones. But because the end result is a sorted list of shortest paths, the sorting barrier sets a fundamental limit on how fast the algorithm can run.
-
Tech6 days ago
I’ve Tested Countless Mesh Systems. Here Are the Routers I Recommend
-
Tech1 week ago
Amazon Prime Big Deal Days Is Next Week, but We Already Found 40 Early Deals
-
Tech1 week ago
All Hail the Surprisingly Versatile Packing Cube! These Are Our Favorites
-
Tech6 days ago
Jony Ive Says He Wants His OpenAI Devices to ‘Make Us Happy’
-
Tech1 week ago
AI in an ‘industrial bubble’ but will benefit society: Bezos
-
Business1 week ago
Investors are packing up; Pakistan must ask why | The Express Tribune
-
Tech1 week ago
Amazon is overhauling its devices to take on Apple in the AI era
-
Tech1 week ago
Combat Dry Indoor Winter Air With a New Humidifier