Tech
African languages for AI: The project that’s gathering a huge new dataset
Artificial intelligence (AI) tools like ChatGPT, DeepSeek, Siri or Google Assistant are developed by the global north and trained in English, Chinese or European languages. In comparison, African languages are largely missing from the internet.
A team of African computer scientists, linguists, language specialists and others have been working on precisely this problem for two years already. The African Next Voices project recently released what’s thought to be the largest dataset of African languages for AI so far. We asked them about their project, with sites in Kenya, Nigeria and South Africa.
Why is language so important to AI?
Language is how we interact, ask for help, and hold meaning in community. We use it to organize complex thoughts and share ideas. It’s the medium we use to tell an AI what we want—and to judge whether it understood us.
We are seeing an upsurge of applications that rely on AI, from education to health to agriculture. These models are trained from large volumes of (mostly) linguistic (language) data. These are called large language models or LLMs but are found in only a few of the world’s languages.
Languages also carry culture, values and local wisdom. If AI doesn’t speak our languages, it can’t reliably understand our intent, and we can’t trust or verify its answers. In short: without language, AI can’t communicate with us—and we can’t communicate with it. Building AI in our languages is therefore the only way for AI to work for people.
If we limit whose language gets modeled, we risk missing out on the majority of human cultures, history and knowledge.
Why are African languages missing and what are the consequences for AI?
The development of language is intertwined with the histories of people. Many of those who experienced colonialism and empire have seen their own languages being marginalized and not developed to the same extent as colonial languages. African languages are not as often recorded, including on the internet.
So there isn’t enough high-quality, digitized text and speech to train and evaluate robust AI models. That scarcity is the result of decades of policy choices that privilege colonial languages in schools, media and government.
Language data is just one of the things that’s missing. Do we have dictionaries, terminologies, glossaries? Basic tools are few and many other issues raise the cost of building datasets. These include African language keyboards, fonts, spell-checkers, tokenizers (which break text into smaller pieces so a language model can understand it), orthographic variation (differences in how words are spelled across regions), tone marking and rich dialect diversity.
The result is AI that performs poorly and sometimes unsafely: mistranslations, poor transcription, and systems that barely understand African languages.
In practice this denies many Africans access—in their own languages—to global news, educational materials, health care information, and the productivity gains AI can deliver.
When a language isn’t in the data, its speakers aren’t in the product, and AI cannot be safe, useful or fair for them. They end up missing the necessary language technology tools that could support service delivery. This marginalizes millions of people and increases the technology divide.
What is your project doing about it—and how?
Our main objective is to collect speech data for automatic speech recognition (ASR). ASR is an important tool for languages that are largely spoken. This technology converts spoken language into written text.
The bigger ambition of our project is to explore how data for ASR is collected and how much of it is needed to create ASR tools. We aim to share our experiences across different geographic regions.
The data we collect is diverse by design: spontaneous and read speech; in various domains—everyday conversations, health care, financial inclusion and agriculture. We are collecting data from people of diverse ages, gender and educational backgrounds.
Every recording is collected with informed consent, fair compensation and clear data-rights terms. We transcribe with language-specific guidelines and a large range of other technical checks.
In Kenya, through Maseno Centre for Applied AI, we are collecting voice data for five languages. We’re capturing the three main language groups Nilotic (Dholuo, Maasai and Kalenjin) as well as Cushitic (Somali) and Bantu (Kikuyu).
Through Data Science Nigeria, we are collecting speech in five widely spoken languages—Bambara, Hausa, Igbo, Nigerian Pidgin and Yoruba. The dataset aims to accurately reflect authentic language use within these communities.
In South Africa, working through the Data Science for Social Impact lab and its collaborators, we have been recording seven South African languages. The aim is to reflect the country’s rich linguistic diversity: isiZulu, isiXhosa, Sesotho, Sepedi, Setswana, isiNdebele and Tshivenda.
Importantly, this work does not happen in isolation. We are building on the momentum and ideas from the Masakhane Research Foundation network, Lelapa AI, Mozilla Common Voice, EqualyzAI, and many other organizations and individuals who have been pioneering African language models, data and tooling.
Each project strengthens the others, and together they form a growing ecosystem committed to making African languages visible and usable in the age of AI.
How can this be put to use?
The data and models will be useful for captioning local-language media; voice assistants for agriculture and health; call-center and support in the languages. The data will also be archived for cultural preservation.
Larger, balanced, publicly available African language datasets will allow us to connect text and speech resources. Models will not just be experimental, but useful in chatbots, education tools and local service delivery. The opportunity is there to go beyond datasets into ecosystems of tools (spell-checkers, dictionaries, translation systems, summarization engines) that make African languages a living presence in digital spaces.
In short, we are pairing ethically collected, high-quality speech at scale with models. The aim is for people to be able to speak naturally, be understood accurately, and access AI in the languages they live their lives in.
What happens next for the project?
This project only collected voice data for certain languages. What of the remaining languages? What of other tools like machine translation or grammar checkers?
We will continue to work on multiple languages, ensuring that we build data and models that reflect how Africans use their languages. We prioritize building smaller language models that are both energy efficient and accurate for the African context.
The challenge now is integration: making these pieces work together so that African languages are not just represented in isolated demos, but in real-world platforms.
One of the lessons from this project, and others like it, is that collecting data is only step one. What matters is making sure that the data is benchmarked, reusable, and linked to communities of practice. For us, the “next” is to ensure that the ASR benchmarks we build can connect with other ongoing African efforts.
We also need to ensure sustainability: that students, researchers, and innovators have continued access to compute (computer resources and processing power), training materials and licensing frameworks (Like NOODL or Esethu). The long-term vision is to enable choice: so that a farmer, a teacher, or a local business can use AI in isiZulu, Hausa, or Kikuyu, not just in English or French.
If we succeed, built-in AI in African languages won’t just be catching up. It will be setting new standards for inclusive, responsible AI worldwide.
This article is republished from The Conversation under a Creative Commons license. Read the original article.
Citation:
African languages for AI: The project that’s gathering a huge new dataset (2025, October 19)
retrieved 19 October 2025
from https://techxplore.com/news/2025-10-african-languages-ai-huge-dataset.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.
Tech
This Unique Air Fryer Cooks Your Food in Heat-Proof Glass—It’s on Sale Right Now
Want to be the hero of your next Super Bowl party? Check out the Ninja Crispi Portable Glass Air Fryer, an intriguing twist on the air fryer that’s a perfect option for air frying your favorite frozen snacks at your next potluck—it’ll help you win the day by frying up a batch of wings at your buddy’s house. It’s currently marked down as low as $150 on Amazon, depending on your color preference.
Basically, every other air fryer works by circulating hot air through a purpose-built, egg-like basket. But the genius of the Crispi is that its glass fryer tray doubles as a serving tray and a sealable fryer dish. The heating element and fan live in a hat-like unit with clamps that attach to heat-shock-resistant borosilicate glass frying baskets with ceramic-coated trays inside. While the actual “fry” setting can be a little intense, the “bake” setting works perfectly for softer foods like veggies. Either way, the results spoke for themselves in our testing, with crispy fries and nuggets even outside in freezing temperatures.
Our reviewer Matthew Korfhage found the “recrisp” setting particularly useful for bringing leftover pizza slices and noodle dishes back to life. That makes it a great choice for office lunchtimes, where you can leave the Crispi itself in a drawer at your desk and ferry leftovers from home. You only have to wash a normal glass dish, rather than the entire fry basket like most options, so this would also work well for dorms or even camping, if there’s an outlet nearby.
Of course, you’ll have to make a few compromises in order to take your precious air-fried snacks with you on the go. While the glass container does a surprisingly good job of insulating the food during cooking, the temperature range isn’t quite as exact as some of our other favorite air fryers.
You’ll also have to be a little bit flexible on your aesthetic choices. While the pastel-hued Cherry Crush, Frosted Lilac, and Ginger Snap colors are marked down to the lower $150 price, the less pronounced Sage, Stone, and Cyberspace Gray are slightly higher at $160, but still under the usual $180 price tag. There’s also a Racing Green Bundle that includes all three sizes of glass crisping tray for $190, if you think you’ll end up buying them anyway.
Tech
All-Clad Is the Expensive Gold Standard. The Factory Seconds Sale Makes It More Affordable
All-Clad Deals used to be difficult to find, but thankfully, the Factory Seconds Sale has come back around for a little while. These sales tend to only last for a few days—this one expires at midnight tomorrow, January 21—though they sometimes are extended. In any case, these sales offer a reliable way to score a solid deal on All-Clad kitchenware, which is normally very expensive. We love and swear by All-Clad, as do many professional chefs.
Factory Seconds are products with minor imperfections that still perform as intended. Sometimes an item is “second quality,” meaning it might have some blemishes or dents. Sometimes an item just has packaging damage. Every product page lists the exact reason for the “Factory Seconds” designation, as well as its warranty; most items are backed by All-Clad’s lifetime warranty. Note that you’ll need to enter your email to access the sale, and flat-rate shipping adds $10. Orders ship in 10 to 15 business days. We’ve highlighted our favorite deals below.
Make sure to check out our related buying guides, including the Best Chef’s Knives, Best Meal Kit Subscriptions, and Best Coffee Makers.
Best All-Clad Factory Seconds Deals
We include this pan at every possible opportunity when it’s on sale because it’s such a solid kitchen companion. Many WIRED Reviews team members have it in their kitchens. The shape allows you to make a pan sauce or sear up some steaks. The high walls prevent grease splatter, and you can use it like a wok or Dutch oven in addition to a regular ol’ pan. It’s dishwasher-safe for easy cleanup.
This roaster is a staple in my kitchen during the colder months of the year. It’s safe to use in the oven and under the broiler at up to 600 degrees Fahrenheit, and it has enough room for roasting meats or vegetables in large portions (it can hold up to a 20-pound turkey). You can also transfer it to the stovetop to whip up a quick sauce with the roasted drippings. The manufacturer recommends hand-washing.
This hard-anodized nonstick pan is versatile enough to make just about anything. Eggs, vegetables, a pan sauce, and stir-fries are all contenders. It’s made with a PTFE coating. Make sure not to get it too hot, and use nonstick-safe tools and hand-wash it to preserve that coating for as long as you can.
This nonstick pot has a PTFE coating and therefore should be hand-washed. It can be used to simmer, stew, or steam thanks to its tall sides and included lid. It also comes with a steamer basket for all of your vegetable and/or dumpling needs. The pieces nest together for easier storage.
So technically, this thing isn’t a spatula, but in my house that’s what we’d call it. Whether you’re Team Turner or not, nonstick-safe tools can be difficult to come by and they’re crucial to keep around if you’re cooking on nonstick cookware. I like having backups so I don’t have to constantly do dishes. This turner is heat-safe up to 425 degrees Fahrenheit and will come in handy for everything from eggs to grilled cheese sandwiches.
The exact reasoning for this being a Factory Seconds item isn’t listed, but a good cast-iron skillet is indispensable for every home chef. It has two pour spouts for easier siphoning or serving, and the finish is resistant to scratches and stains. The skillet is oven-safe up to 650 degrees Fahrenheit.
What Are All-Clad Factory Seconds?
The Factory Seconds Event is run by Home and Cook Sales, an authorized reseller for All-Clad and several other cookware brands. The items featured in the sale (usually) have minor imperfections, like a scuff on the pan, a misaligned name stamp, or simply a dented box. Every product on the website lists the nature of the imperfection in the title (e.g., packaging damage). You’ll need to enter an email address to access the sale.
While the blemishes vary, the merchant says all of the cookware will perform as intended. Should any issue arise, nearly every All-Clad Factory Seconds product is backed by All-Clad’s limited lifetime warranty. (Electric items have a slightly different warranty; check individual product pages for details.) We’ve used more than a dozen Factory Seconds pots, pans, and accessories, and they’ve all worked exactly as advertised. Just remember that all sales are final, and note that you’ll have to pay $10 for shipping. It’s also worth noting that the “before” prices are based on buying the items new, but we still think it offers a good indication of how much you’re saving versus the value you’re getting.
Power up with unlimited access to WIRED. Get best-in-class reporting and exclusive subscriber content that’s too important to ignore. Subscribe Today.
Tech
‘Veronika’ Is the First Cow Known to Use a Tool
Justice for Far Side cartoonist Gary Larson: A team of scientists has observed, for the first time, a cow using a tool in a flexible manner. The ingenuity of “Veronika,” as the animal is called, shows that cattle possess enough intelligence to manipulate elements of their environment and solve challenges they would otherwise be unable to overcome.
Veronika is a pet cow in Austria. Her owners don’t use her for meat or milk production. Nor was she trained to do tricks; on the contrary, for the past 10 years she has developed the ability to find branches in the grass, choose one, hold it with her mouth, and scratch herself with it to relieve skin irritation.
Until now, only chimpanzees had convincingly demonstrated the ability to employ tools to improve their living conditions. Recent studies also point to whales as the only marine animals capable of using complex tools. This European cow is about to join that exclusive group of ingenious animals.
Videos of Veronika circulating online caught the attention of veterinary researchers in Vienna. They visited the farm, conducted behavioral tests, and carried out controlled trials. “In repeated sessions, they verified that her decisions were consistent and functionally appropriate,” a press release stated.
Veronika’s abilities go beyond simply using a point to scratch herself, explain the authors of the study published in Current Biology. In the tests, the cow was offered different textures and objects, and she adapted according to her needs. Sometimes she chose soft bristles and other times a stiffer point. The researchers say she used different parts of the same tool for specific purposes and even modified her technique depending on the type of object or the area of her body she wanted to scratch.
Although they consider using a tool to relieve irritation “less complex” compared to, for example, using a sharp rock to access seeds, the specialists greatly value Veronika’s ability. For now, she demonstrates that she can decide which part of the tool is most useful to her. The finding suggests that we have underestimated the cognitive capacity of cattle, according to the authors.
Why Is Veronika So Skilled?
The team acknowledges that it’s still too early to say that all cows can use tools with the same skill as Veronika. For now, the researchers are trying to determine how this cow developed an awareness of her surroundings.
Researchers believe her particular circumstances played a role. Veronika has lived for 10 years in a complex, open environment filled with manipulable objects—a very different experience from that of cattle raised for milk and meat production. These conditions fostered exploratory and innovative behavior, they say. They are now searching for more videos of cattle using tools to gather further evidence about their cognitive abilities.
“Until now, tool use was considered a select club, almost exclusively for primates (especially great apes, but also macaques and capuchins), some birds like corvids and parrots, and marine mammals like dolphins. Finding it in a cow is a fascinating example of convergent evolution: intelligence arises as a response to similar problems, regardless of how different the animal’s ‘design’ may be,” said Miquel Llorente, director of the Department of Psychology at the University of Girona, who was not involved in the study, in a statement to the Science Media Centre Spain.
-
Tech1 week agoNew Proposed Legislation Would Let Self-Driving Cars Operate in New York State
-
Entertainment1 week agoX (formerly Twitter) recovers after brief global outage affects thousands
-
Sports6 days agoPak-Australia T20 series tickets sale to begin tomorrow – SUCH TV
-
Politics4 days agoSaudi King Salman leaves hospital after medical tests
-
Tech5 days agoMeta’s Layoffs Leave Supernatural Fitness Users in Mourning
-
Tech6 days agoTwo Thinking Machines Lab Cofounders Are Leaving to Rejoin OpenAI
-
Business4 days agoTrump’s proposed ban on buying single-family homes introduces uncertainty for family offices
-
Fashion4 days agoBangladesh, Nepal agree to fast-track proposed PTA


