Tech
Can LLMs understand scientists? | Computer Weekly
The use of large language models (LLMs) as an alternative to search engines and recommendation algorithms is increasing, but early research suggests there is still a high degree of inconsistency and bias in the results these models produce. This has real-world consequences, as LLMs play a greater role in our decision-making choices.
Making sense of algorithmic recommendations is tough. In the past, we had entire industries dedicated to understanding (and gaming) the results of search engines – but the level of complexity of what goes into our online recommendations has risen several times over in just a matter of years. The massive diversity of use cases for LLMs has made audits of individual applications vital in tackling bias and inaccuracies.
Scientists, governments and civil society are scrambling to make sense of what these models are spitting out. A group of researchers at the Complexity Science Hub in Vienna has been looking at one area in particular where these models are being used: identifying scholarly experts. Specifically, these researchers were interested in which scientists are being recommended by these models – and which were not.
Lisette Espín-Noboa, a computer scientist working on the project, had been looking into this before major LLMs had hit the market: “In 2021, I was organising a workshop, and I wanted to come up with a list of keynote speakers.” First, she went to Google Scholar, an open-access database of scientists and their publications. “[Google Scholar] rank them by citations – but for several reasons, citations are biased.”
This meant trawling through pages and pages of male scientists. Some fields of science are simply more popular than others, with researchers having more influence purely due to the size of their discipline. Another issue is that older scientists – and older pieces of research – will naturally have more citations simply for being around longer, rather than the novelty of their findings.
“It’s often biased towards men,” Espín-Noboa points out. Even with more women entering the profession, most scientific disciplines have been male-dominated for decades.
Daniele Barolo, another researcher at the Complexity Science Hub, describes this as an example of the Matthew Effect. “If you sort the authors only by citation counts, it’s more likely they will be read and therefore cited, and this will create a reinforcement loop,” he explains. In other words, the rich get richer.
Espín-Noboa continues: “Then I thought, why don’t I use LLMs?” These tools could also fill in the gaps by including scientists that aren’t on Google Scholar.
But first, they would have to understand whether these were an improvement. “We started doing these audits because we wanted to know how much they knew about people, [and] if they were biased towards men or not,” Espín-Noboa says. The researchers also wanted to see how accurate the tools were and whether they displayed any biases based on ethnicity.
Auditing
They came up with an experiment which would test the recommendations given by LLMs along various lines, narrowing their requests to scientists published in the journal of the American Physical Society. They asked these LLMs for various recommendations, such as the most important in certain fields or to identify experts from certain periods of time.
While they couldn’t test for the absolute influence of a scientist – no such “ground truth” for this exists – the experiment did surface some interesting findings. Their paper, which is currently available as a preprint, suggests Asian scientists are significantly underrepresented in the recommendations provided by LLMs, and that existing biases against female authors are often replicated.
Despite detailed instructions, in some cases these models would hallucinate the names of scientists, particularly when asked for large lists of recommendations, and would not always be able to differentiate between varying fields of expertise.
“LLMs cannot be seen as directly as databases, because they are linguistic models,” Barolo says.
One test was to prompt the LLM with the name of a scientist and to ask it for someone of a similar academic profile – a “statistical twin”. But when they did this, “not only scientists that actually work in a similar field were recommended, but also people with a similar looking name” adds Barolo.
As with all experiments, there are certain limitations: for a start, this study was only conducted on open-weight models. These have a degree of transparency, although not as much as fully open-source models. Users are able to set certain parameters and to modify the structure of the algorithms used to fine-tune their outputs. By contrast, most of the largest foundation models are closed-weight ones, with minimal transparency and opportunities for customisation.
But even open-weight models come up against issues. “You don’t know completely how the training process was conducted and which training data was used,” Barolo points out.
The research was conducted on versions of Meta’s Llama models, Google’s Gemma (a more lightweight model than their flagship Gemini) and a model from Mistral. Each of these has already been superseded by newer models – a perennial problem for carrying out research on LLMs, as the academic pipeline cannot move as quickly as industry.
Aside from the time needed to execute research itself, papers can be held up for months or years in review. On top of this, a lack of transparency and the ever-changing nature of these models can create difficulties in reproducing results, which is a crucial step in the scientific process.
An improvement?
Espín-Noboa has previously worked on auditing more low-tech ranking algorithms. In 2022, she published a paper analysing the impacts of PageRank – the algorithm which arguably gave Google its big breakthrough in the late 1990s. It has since been used by LinkedIn, Twitter and Google Scholar.
PageRank was designed to make a calculation based on the number of links an item has in a network. In the case of webpages, this might be how many websites link to a certain site; or for scholars, it might make a similar calculation based on co-authorships.
Espín-Noboa’s research shows the algorithm has its own problems – it may serve to disadvantage minority groups. Despite this, PageRank is still fundamentally designed with recommendations in mind.
In contrast, “LLMs are not ranking algorithms – they do not understand what a ranking is right now”, says Espín-Noboa. Instead, LLMs are probabilistic – making a best guess at a correct answer by weighing up word probabilities. Espín-Noboa still sees promise in them, but says they are not up to scratch as things stand.
There is also a practical component to this research, as these researchers hope to ultimately create a way for people to better seek recommendations.
“Our final goal is to have a tool that a user can interact with easily using natural language,” says Barolo. This will be tailored to the needs of the user, allowing them to pick which issues are important to them.
“We believe that agency should be on the user, not on the LLM,” says Espín-Noboa. She uses the example of Google’s Gemini image generator overcorrecting for biases – representing American founding fathers (and Nazi soldiers) as people of colour after one update, and leading to it being temporarily suspended by the company.
Instead of having tech companies and programmers make sweeping decisions on the model’s output, users should be able to pick the issues most important to them.
The bigger picture
Research such as that going on at the Complexity Science Hub is happening across Europe and the world, as scientists race to understand how these new technologies are affecting our lives.
Academia has a “really important role to play”, says Lara Groves, a senior researcher at the Ada Lovelace Institute. Having studied how audits are taking place in various contexts, Groves says groups of academics – such as the annual FAccT conference on fairness, transparency and accountability – are “setting the terms of engagement” for audits.
Even without full access to training data and the algorithms these tools are built on, academia has “built up the evidence base for how, why and when you might do these audits”. But she warns these efforts can be hampered by the level of access that researchers are provided with, as they are often only able to look at their outputs.
Despite this, she would like to see more assessments taking place “at the foundation model layer”. Groves continues: “These systems are highly stochastic and highly dynamic, so it’s impossible to tell the range of outputs upstream.” In other words, the massive variability of what LLMs are producing means we ought to be checking under the hood before we start looking at their use cases.
Other industries – such as aviation or cyber security – already have rigorous processes for auditing. “It’s not like we’re working from first principles or from nothing. It’s identifying which of those mechanisms and approaches are analogous to AI,” Groves adds.
Amid an arms race for AI supremacy, any testing done by the major players is closely guarded. There have been occasional moments of openness: in August, OpenAI and Anthropic carried out audits on each other’s models and released their findings to the public.
Much of the work of interrogating LLMs will still fall to those outside of the tent. Methodical, independent research might allow us to glimpse into what’s driving these tools, and maybe even reshape them for the better.
Tech
The Future of EVs Is Foggy—but California Still Wants More of Them
It’s been a weird and confusing few weeks for the auto industry—especially for those who hoped to see more batteries on the road in the coming decade.
Just this month: Ford announced a retrenchment in its EV business, canceling some battery-powered vehicle plans and delaying others; the European Commission proposed to backtrack its goal to transition fully to zero-emission cars by 2035; the US government said it would loosen rules that would have required automakers to ratchet up the fuel economy of their fleets. BloombergNEF projects 14 million fewer EVs will be sold in the US by 2030 than it did last year—a 20 percent drop.
What has not changed, it seems, is California’s interest in shifting to cleaner transportation. “The state is doubling down on our zero-emission vehicle deployment, providing market certainty, and continuing to lead on clean transportation regardless of policy reversals elsewhere or shifts by automakers,” Anthony Martinez, a spokesperson for Governor Gavin Newsom, wrote in a statement to WIRED. He said the governor’s “commitment to accelerating California’s clean transportation transition hasn’t changed.”
In 2020, Newsom became one of the first lawmakers in the world to commit to full electrification when he signed an executive order directing state agencies to create rules that would ban the sale of new gas-powered cars in the state by 2035. Those rules eventually aimed to ratchet up the share of battery-electric vehicles, with an ultimate goal of a mix of pure EVs and plug-in hybrids. (The PHEVs could only account for about 20 percent of sales.) Several other states, including Massachusetts, New York, Oregon, and Washington State, pledged to do the same.
Earlier this year, the GOP-led Congress revoked, through legislation, California’s power to set its own clean air regulations. The state responded with a lawsuit, which is still being argued. Meanwhile, Newsom signed another executive order directing state agencies to further the state’s electrification goals in other ways.
Now auto industry experts and players say the state’s determination to push through policy and market changes to meet its now half-decade-old goal may be overly ambitious.
“Getting to 100 percent might be challenging,” says Stephanie Valdez Streaty, the director of industry insights at Cox Automotive. “There are a lot of headwinds.”
A coalition of California business groups have argued that the state’s goals even for next year—a requirement that 35 percent of model year 2026 vehicles sold are zero-emission—aren’t realistic, and that California should push back its goals for zero-emission new car sales. (Enforcement of the rules is paused while the larger battle with US Congress plays out.) Zero-emission cars accounted for 21 percent of the overall annual state new car sales as of the fall, according to the California New Car Dealers Association, well below the 35 percent goal. “The timeline needed to be adjusted,” says the group’s president, Brian Maas.
Tech
Top 10 IT leadership interviews of 2025 | Computer Weekly
Artificial intelligence (AI) has been the biggest talking point for IT leaders in 2025 – both the emerging capabilities and opportunities from the technology, and the challenges of implementing it at scale and in a way that delivers measurable benefits.
For the digital, data and technology leaders that Computer Weekly is privileged to talk to every week, building AI into their wider strategies and managing often over-hyped expectations just adds to the difficulties of one of the most important roles in any modern organisation.
All of that is taking place while they need to keep a tight rein on costs in a still difficult economy, and juggle skills shortages, talent development and ensuring cyber security. So, how well are they doing?
Computer Weekly gets access to some of the top technology leaders in the world – and the details they share make fascinating reading for anyone looking to develop and implement an IT strategy to improve their business, support employees and enhance their careers.
Here are Computer Weekly’s top 10 interviews with IT leaders in 2025:
The BBC’s research and development (R&D) arm serves a public purpose, which, according to director Jatin Aythora, is to make some of the technologies and inventions it creates available for free or at a really low cost. Aythora sees his job as helping to achieve technical breakthroughs that the news and media industry can benefit from, which he says BBC R&D has done for many years. Computer Weekly talks to him about self-belief and learning from different industries
The UK mapping service has moved on a long way from paper maps as it now looks to use AI to understand, interpret and derive insights from geographical data. CTO Manish Jethwa has a career-long passion for turning geographical data into useful insight, and he’s leading the organisation’s development of next-generation geospatial technologies.
As a technologist who also runs corporate operations, Thomson Reuters’ CTO believes her tech background gives her a unique edge as the business information group looks to transform its products with AI. That’s why she’s on a mission to use digital systems to transform internal processes and customer services.
Richard Masters, vice-president of data and AI at Virgin Atlantic, is an expert in enterprise data, but his career began somewhere different – space. Before moving into analytics, Masters completed a PhD in astrophysics at the University of Oxford. He is now applying his expertise in astrophysics to the nitty-gritty details of using AI to improve customer experience.
The vehicle recovery specialist is looking to AI and connected vehicle technology to enhance customer experience and get motorists back on the road in the shortest possible time. Group CIO Antony Hausdoerfer is driving the plan for digital transformation.
Digital media is core to engaging nearly two billion fans of Premier League football around the world, with data analytics and AI playing an ever-more important role. For Alexandra Willis, director of digital media and audience development at the organisation that runs top-level club football in England, the priority is to establish data-enabled experiences that keep fans just as engaged and entertained off the pitch.
Among the questions a head of technology may ponder are: what does it mean to be innovative, and, perhaps, what technology can be used to drive an innovation strategy? Given the main way people tend to place bets with Bet365 is via its mobile app, Alan Reed, head of platform innovation at Bet365’s Hillside Technology platform, talks to Computer Weekly about how generative AI changes the way people interact with computers.
Kate Balingit has been leading the digital health initiative at Mars Pet Nutrition, reporting to the company’s pet care CIO, where she is focused on commercialising and deploying artificial intelligence through well-known pet food brands such as Pedigree, Iams, Sheba and Whiskas. She talks to Computer Weekly about making AI relevant across its brands to support pet health.
Dan Keyworth, director of business technology at McLaren Racing, says his role involves running the tech at the sharp end of Formula One, all the IT infrastructure that must be deployed to Grand Prix races, and the IT that keeps the business of McLaren Racing on track.
The world of performing arts is in a completely different universe compared to the bits, bytes and IT infrastructure that Keith Nolan and the IT team at Royal Ballet and Opera spend their work time in. He talks about how IT lowers costs and helps power stage innovations for world-class performances.
Tech
Pair Your Mac Mini With One of These Great Monitors
Just about any monitor can work with a Mac Mini. It doesn’t need to be made by Apple or have any official certification. There’s a case to be made for using a cheap 1080p monitor with the Mac Mini, but most Mac users will want something a bit more premium. As you can see by options like the Dell 27 Plus 4K, that doesn’t have to mean overly expensive. Either way, here are the four elements to consider when shopping for a good monitor to go with your Mac Mini.
Size and resolution: 27-inch and 32-inch monitors are the most common sizes these days, and there are larger options. I would also consider a 34-inch ultrawide monitor if you like the wider, 21:9 aspect ratio with the curved shape. With Apple, resolution is king. There’s a reason it invests so much in high pixel density for every screen it sells, even down to the entry-level options like the MacBook Air. Pixel density is what gives a screen its sharpness, and you need a lot more pixels when they’re stretched out across a large, external monitor. If you want to keep the fidelity up, I wouldn’t buy anything under 4K, and bumping up to 5K or 6K on a 32-inch monitor can be helpful. You also want to consider the refresh rate here. A 120-Hz refresh rate is what the MacBook Pro has, offering smoother animation, especially in games.
Adjustability: Apple monitors and iMacs aren’t exactly known for adjustability. They often have none at all, and cost more when they do. That isn’t the best for your posture and ergonomics. Famously, the Pro Display XDR charges an extra $1,000 to add a Pro Stand with proper adjustability. For ergonomic purposes, the top of the screen you’re working on should be as close to eye level as possible, and that ranges depending on someone’s height. If a monitor doesn’t have height adjustability, you’ll have to depend on a separate monitor stand or arm. Other than height adjustment, many monitors also have a stand that can swivel, tilt, and rotate, all of which are important when using multiple monitors together. This is also needed if you want to use a second monitor vertically, which has become increasingly popular.
Ports: Even the cheapest monitors will always have HDMI as a connection, which is all you need to connect directly to the back of the Mac Mini. Some monitors have USB-C that support display, which will let you connect to one of the Mac Mini’s Thunderbolt ports. The M4 Mac Mini comes with three Thunderbolt 4 ports, HDMI, and an Ethernet jack. The M4 Pro model has the same ports, except the ports are Thunderbolt 5 instead of 4. You’ll need to use at least one of these Thunderbolt ports if you want to connect more than one external monitor. These monitors also tend to have other ports, such as USB-A. These can be useful, as the Mac Mini doesn’t have any on its own.
Image quality: Apple prides itself on the image quality of its Macs, so in the case of the Mac Mini, you’ll likely want to get something worthy of your Mac. This is especially important for content creators, photographers, and designers. So, you’ll want to consider a monitor’s brightness, color accuracy, color coverage, and contrast. While some IPS displays offer decent color and contrast, mini-LED or OLED displays will guarantee better image quality. These also have significantly higher peak brightness in HDR content, which really brings games and movies to life.
-
Business1 week agoHitting The ‘High Notes’ In Ties: Nepal Set To Lift Ban On Indian Bills Above ₹100
-
Business7 days agoStudying Abroad Is Costly, But Not Impossible: Experts On Smarter Financial Planning
-
Business1 week agoKSE-100 index gains 876 points amid cut in policy rate | The Express Tribune
-
Sports7 days agoJets defensive lineman rips NFL officials after ejection vs Jaguars
-
Tech1 week agoFor the First Time, AI Analyzes Language as Well as a Human Expert
-
Entertainment7 days agoPrince Harry, Meghan Markle’s 2025 Christmas card: A shift in strategy
-
Business4 days agoBP names new boss as current CEO leaves after less than two years
-
Fashion4 days agoIndonesia’s thrift surge fuels waste and textile industry woes
