Tech

African languages for AI: The project that’s gathering a huge new dataset

Published

5 months ago

October 19, 2025

African languages for AI: The project that’s gathering a huge new dataset

Credit: Unsplash/CC0 Public Domain

Artificial intelligence (AI) tools like ChatGPT, DeepSeek, Siri or Google Assistant are developed by the global north and trained in English, Chinese or European languages. In comparison, African languages are largely missing from the internet.

A team of African computer scientists, linguists, language specialists and others have been working on precisely this problem for two years already. The African Next Voices project recently released what’s thought to be the largest dataset of African languages for AI so far. We asked them about their project, with sites in Kenya, Nigeria and South Africa.

Why is language so important to AI?

Language is how we interact, ask for help, and hold meaning in community. We use it to organize complex thoughts and share ideas. It’s the medium we use to tell an AI what we want—and to judge whether it understood us.

We are seeing an upsurge of applications that rely on AI, from education to health to agriculture. These models are trained from large volumes of (mostly) linguistic (language) data. These are called large language models or LLMs but are found in only a few of the world’s languages.

Languages also carry culture, values and local wisdom. If AI doesn’t speak our languages, it can’t reliably understand our intent, and we can’t trust or verify its answers. In short: without language, AI can’t communicate with us—and we can’t communicate with it. Building AI in our languages is therefore the only way for AI to work for people.

If we limit whose language gets modeled, we risk missing out on the majority of human cultures, history and knowledge.

Why are African languages missing and what are the consequences for AI?

The development of language is intertwined with the histories of people. Many of those who experienced colonialism and empire have seen their own languages being marginalized and not developed to the same extent as colonial languages. African languages are not as often recorded, including on the internet.

So there isn’t enough high-quality, digitized text and speech to train and evaluate robust AI models. That scarcity is the result of decades of policy choices that privilege colonial languages in schools, media and government.

Language data is just one of the things that’s missing. Do we have dictionaries, terminologies, glossaries? Basic tools are few and many other issues raise the cost of building datasets. These include African language keyboards, fonts, spell-checkers, tokenizers (which break text into smaller pieces so a language model can understand it), orthographic variation (differences in how words are spelled across regions), tone marking and rich dialect diversity.

The result is AI that performs poorly and sometimes unsafely: mistranslations, poor transcription, and systems that barely understand African languages.

In practice this denies many Africans access—in their own languages—to global news, educational materials, health care information, and the productivity gains AI can deliver.

When a language isn’t in the data, its speakers aren’t in the product, and AI cannot be safe, useful or fair for them. They end up missing the necessary language technology tools that could support service delivery. This marginalizes millions of people and increases the technology divide.

What is your project doing about it—and how?

Our main objective is to collect speech data for automatic speech recognition (ASR). ASR is an important tool for languages that are largely spoken. This technology converts spoken language into written text.

The bigger ambition of our project is to explore how data for ASR is collected and how much of it is needed to create ASR tools. We aim to share our experiences across different geographic regions.

The data we collect is diverse by design: spontaneous and read speech; in various domains—everyday conversations, health care, financial inclusion and agriculture. We are collecting data from people of diverse ages, gender and educational backgrounds.

Every recording is collected with informed consent, fair compensation and clear data-rights terms. We transcribe with language-specific guidelines and a large range of other technical checks.

In Kenya, through Maseno Centre for Applied AI, we are collecting voice data for five languages. We’re capturing the three main language groups Nilotic (Dholuo, Maasai and Kalenjin) as well as Cushitic (Somali) and Bantu (Kikuyu).

Through Data Science Nigeria, we are collecting speech in five widely spoken languages—Bambara, Hausa, Igbo, Nigerian Pidgin and Yoruba. The dataset aims to accurately reflect authentic language use within these communities.

In South Africa, working through the Data Science for Social Impact lab and its collaborators, we have been recording seven South African languages. The aim is to reflect the country’s rich linguistic diversity: isiZulu, isiXhosa, Sesotho, Sepedi, Setswana, isiNdebele and Tshivenda.

Importantly, this work does not happen in isolation. We are building on the momentum and ideas from the Masakhane Research Foundation network, Lelapa AI, Mozilla Common Voice, EqualyzAI, and many other organizations and individuals who have been pioneering African language models, data and tooling.

Each project strengthens the others, and together they form a growing ecosystem committed to making African languages visible and usable in the age of AI.

How can this be put to use?

The data and models will be useful for captioning local-language media; voice assistants for agriculture and health; call-center and support in the languages. The data will also be archived for cultural preservation.

Larger, balanced, publicly available African language datasets will allow us to connect text and speech resources. Models will not just be experimental, but useful in chatbots, education tools and local service delivery. The opportunity is there to go beyond datasets into ecosystems of tools (spell-checkers, dictionaries, translation systems, summarization engines) that make African languages a living presence in digital spaces.

In short, we are pairing ethically collected, high-quality speech at scale with models. The aim is for people to be able to speak naturally, be understood accurately, and access AI in the languages they live their lives in.

What happens next for the project?

This project only collected voice data for certain languages. What of the remaining languages? What of other tools like machine translation or grammar checkers?

We will continue to work on multiple languages, ensuring that we build data and models that reflect how Africans use their languages. We prioritize building smaller language models that are both energy efficient and accurate for the African context.

The challenge now is integration: making these pieces work together so that African languages are not just represented in isolated demos, but in real-world platforms.

One of the lessons from this project, and others like it, is that collecting data is only step one. What matters is making sure that the data is benchmarked, reusable, and linked to communities of practice. For us, the “next” is to ensure that the ASR benchmarks we build can connect with other ongoing African efforts.

We also need to ensure sustainability: that students, researchers, and innovators have continued access to compute (computer resources and processing power), training materials and licensing frameworks (Like NOODL or Esethu). The long-term vision is to enable choice: so that a farmer, a teacher, or a local business can use AI in isiZulu, Hausa, or Kikuyu, not just in English or French.

If we succeed, built-in AI in African languages won’t just be catching up. It will be setting new standards for inclusive, responsible AI worldwide.

Provided by
The Conversation

This article is republished from The Conversation under a Creative Commons license. Read the original article.

Citation:
African languages for AI: The project that’s gathering a huge new dataset (2025, October 19)
retrieved 19 October 2025
from https://techxplore.com/news/2025-10-african-languages-ai-huge-dataset.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.

Source link

Related Topics:computer news hi-tech news hitech information technology innovation inventions

Up Next

Scenes From Saturday’s Nationwide ‘No Kings’ Protests

Don't Miss

Here Are the Housewarming Gifts I Love Having in My Home

Click to comment

Tech

Hide Ethernet Cables Around Your Home for Faster Internet Access

Published

17 hours ago

March 8, 2026

cineplex360

Hide Ethernet Cables Around Your Home for Faster Internet Access

Cable ties are ideal for keeping multiple cables bound together and making them easier to manage. You probably have a bunch already, but you can buy a pack of 60 ($7) reusable ones cheaply.

Alex Tech

10-Foot Cable Sleeve

Cable sleeves are even better, since they provide a mesh cover for bundles of cables, making it easy to remove or add cables.

Label Your Cables

If you have more than one cable, make sure that you label them. This can save you a lot of trouble later. Picking a different color for your Ethernet cables (or at least not black, white, or gray) can help you to immediately tell them apart from other cable types, especially handy if you’re installing them behind walls or under floors.

How to Hide Ethernet Cables

There are several ways to hide Ethernet cables, and some are much tougher and more invasive than others.

Running an Ethernet cable along your baseboard or skirting board can be reasonably neat, and it’s easy to do. Depending on your baseboard style, there might be a suitable channel or recess, and you can use cable clips with nails or adhesive. The tricky part is dealing with doors and transitions between rooms. If you’re lucky, there might be enough of a gap under your door, though it can be neater and safer to drill a hole through the wall to get the cable from one room to the next.

Probably the easiest way to hide cables is to stick them under your carpets. It’s best to stay tight to the baseboards to minimize the risk of anyone standing on the cable. If you have carpet grippers around the edges, you may be able to run cables on either side of them to keep them neatly out of the way. Just make sure to avoid high-traffic areas, and if you do have to run a cable across a doorway, get a proper cable protector.

D-Line

6-Foot Floor Cord Cover

If you don’t want visible cables, but can’t go into or under the wall, cable raceways or trunking could be the answer. You can get kits with various lengths of trunking with angled turns to run your cable. The best trunking can also be painted to match your baseboard or walls, which really helps it blend in.

D-Line

Mini Cable Trunking 4-Meter Pack

Maybe your cable run could be an excuse to upgrade your rooms with some crown molding or coving. Crown molding that runs around the top of a room, where the wall meets the ceiling, is easy to fit and can add a decorative flourish and hide paintwork. It can also contain a channel with an Ethernet cable inside, though you’ll still need a neat solution to run the cable in and out.

Behind the Wall or Under the Floor

For the neatest finish, you can’t beat running cable behind your wall or under the floor, but this is also the most difficult way to do it. You need various tools, and it can be a messy job, with potential risks including electrical cables and water pipes. If you’re up for the challenge and your home is suitable, here are a few things that can help you do a good job.

Boeray Fiberglass Flexible Snake Rods ($19): These extendable, flexible rods make it easier to run cables from spot A to spot B with limited access.

Source link

Tech

Is Daylight Saving Time Killing Your Mornings? This Gadget Can Save Them

Published

18 hours ago

March 8, 2026

cineplex360

Is Daylight Saving Time Killing Your Mornings? This Gadget Can Save Them

Ultimately, these lights can do a lot. They can double as a sound machine, help you wake up and fall asleep, and even act as a regular bedside lamp if they’re bright enough. Not all sunrise alarms have all of these features, though, so you have to choose how much you want to spend and what features are most important to you.

What Features Should You Look for in a Sunrise Alarm Clock?

You might see a range of features listed for a sunrise alarm, and more expensive ones will include more of these than cheaper models. If you’re not sure what features you want, try this series of questions to figure out what features you need.

Do you struggle to fall asleep? Splurge on a sunrise alarm with a nighttime or wind-down routine. These help build a routine for you to fall asleep to.

Do you need one device that doubles as an alarm and a bedside lamp? Get a brighter sunrise clock that has easy controls to switch it on as a bedside lamp. Not all sunrise clocks have these, so check the details carefully (and reviews like mine!) and note that cheaper, smaller sunrise alarm clocks usually won’t brighten an entire bedroom.

Are you picky about your alarm sounds? Check how many sounds are offered. Just about every sunrise clock has some sound machine features and options, but cheaper ones tend to only have a couple of sounds and might not have the sound you’re looking for.

Do you want app control? Some options in this guide don’t have a partner app or Wi-Fi capabilities, especially some of my favorites. An app doesn’t necessarily make it a better sunrise clock, but it can be convenient to use. If you prefer an app to set up your sunrise lamp, shop the Casper, Hatch, Loftie, and WiiM.

Which Sunrise Alarm Clocks Are Best?

Lumie

Bodyclock Luxe 700FM

This sunrise alarm is my favorite one. It’s big and bright with a stylish exterior, and has a button for lamp mode so you can easily switch it on to use in the evening as a regular lamp, and it was bright enough to fill my bedroom like a normal lamp. It has a nice range of sounds, and not only connects to the radio but allows you to save five stations. There are both sunrise and sunset settings. The biggest downside is it only has a 24-hour clock, and it doesn’t connect to Wi-Fi or an app so you have to set the time manually (and change it manually for daylight saving). If you want to spend less, the Shine 300 ($169) is a little smaller and has fewer sounds, but otherwise is similarly great.

Source link

Tech

Left-Handed People Are More Competitive, Says Science

Published

19 hours ago

March 8, 2026

cineplex360

Left-Handed People Are More Competitive, Says Science

The very existence of left-handedness seems to defy Darwin. According to the theory of evolution by natural selection (in very simplified terms), a species should retain the characteristics necessary for survival and reproduction and discard those that are not very useful. And yet around 10 percent of people continue to develop greater dexterity in their left hand, a rate that has remained stable throughout history. Why do humans continue to retain this peculiar ability?

A study conducted by researchers at the University of Chieti-Pescara in Italy set out to confirm a hypothesis indicating that, while right-handed people have advantages in cooperative behaviors, left-handed people—particularly males, the study notes—have advantages in competitive behaviors, especially in one-on-one situations. This hypothesis is based on evolutionarily stable strategy (ESS), a concept from game theory applied to evolution.

This is how ESS explains why the proportion of left-handed people remains low but constant. If almost everyone in a population is right-handed, being left-handed offers a frequency-dependent advantage: Being in the minority, left-handers are less predictable in competitive interactions (e.g., a boxing match), which may translate into small advantages (left hook!). But if left-handedness became very common, that advantage would disappear because others would adapt to encountering left-handers with the same frequency. In evolutionary terms, a “stable equilibrium” is reached when the majority are right-handed and a minority are left-handed, because neither “strategy” can completely eliminate the other since their advantages change depending on how frequent each is in the population.

How can a study support this hypothesis? The Italian researchers conducted two experiments to see whether a dominant hand is linked to any specific personality type. The results were recently published in the academic journal Scientific Reports.

Righty vs. Lefty

In the first experiment, about 1,100 participants completed questionnaires designed to measure their handedness (their level of dexterity between one hand and the other) and various facets of competitiveness, such as their inclination to achieve personal goals or their aversion to anxiety-driven competition. The results showed that people who identified with greater left-handed laterality tended to show higher levels of personal development-oriented competitiveness and lower levels of anxious avoidance. That is, left-handers tended to be more inclined to engage in competitive situations than right-handers.

In addition, when strongly lateralized groups were compared (just pure southpaws, no ambidextrousness), left-handers scored higher on “hypercompetitiveness,” a trait that implies an intense desire to win, even at the expense of others.

In the second experiment, a subgroup of 48 participants (half right-handed and half left-handed, with equal proportions of men and women) took a pegboard test, a classic laboratory test that measures manual dexterity. Interestingly, no significant differences were observed here either between left-handers and right-handers or between laterality measures and competitiveness scores. This suggests that hand preference and competitiveness are not directly related to motor skills.

Give Them a Hand

According to the authors of the study, left-handedness is not simply a biological accident, but a characteristic that may offer advantages in competitive contexts and is therefore worth preserving. This supports, at least in part, the idea that the unequal distribution between right-handers and left-handers could be maintained by an evolutionary balance. While the right-handed majority favors social cooperation, the left-handed minority benefits in competitive contexts, where surprise plays a role.

But what about other personality types? Are left-handed people more extroverted or more emotionally unstable? The study cited here found no significant differences between left-handed and right-handed people in the Big Five personality traits (openness, conscientiousness, extraversion, agreeableness, and neuroticism). Nor was there any relationship between handedness and levels of depression or anxiety in this sample of people without a psychiatric diagnosis. This suggests that the advantage associated with left-handedness is more linked to competitiveness than to general differences in personality or mental health.

The study also examined differences by sex. Men, in general, scored higher on hyper-competitiveness and development-oriented competitiveness, while women showed a greater tendency to avoid competition due to anxiety. This suggests that the interaction between hand preference, competitive profile, and gender is complex and likely influenced by multiple biological and environmental factors that warrant further investigation.

This story originally appeared on WIRED en Español and has been translated from Spanish.

Source link

South Korea’s Misto Holdings completes planned leadership transition

Fashion1 week ago

South Korea’s Misto Holdings completes planned leadership transition

Iran launches retaliation against Israel, launches ballistic missiles

Politics1 week ago

Iran launches retaliation against Israel, launches ballistic missiles

Al Jazeera broadcast interrupted by emergency missile alert in Qatar

Entertainment1 week ago

Al Jazeera broadcast interrupted by emergency missile alert in Qatar

India let Iran warship dock the day US sank another off Sri Lanka, say officials

Politics2 days ago

India let Iran warship dock the day US sank another off Sri Lanka, say officials

College basketball star suspended by team for spitting toward opposing fan

Sports1 week ago

College basketball star suspended by team for spitting toward opposing fan

India’s real GDP estimated to grow 7.6% in FY26 under new base FY23

Fashion1 week ago

India’s real GDP estimated to grow 7.6% in FY26 under new base FY23

Transfer rumors, news: Man City, Man United in for Anderson

Sports1 week ago

Transfer rumors, news: Man City, Man United in for Anderson

Sources: Trump, college leaders to tackle issues at roundtable

Sports1 week ago

Sources: Trump, college leaders to tackle issues at roundtable

CinePlex360

African languages for AI: The project that’s gathering a huge new dataset

Tech

African languages for AI: The project that’s gathering a huge new dataset

Why is language so important to AI?

Why are African languages missing and what are the consequences for AI?

What is your project doing about it—and how?

How can this be put to use?

What happens next for the project?

Leave a Reply
Cancel reply

Leave a Reply

Tech