Connect with us

Tech

Here Are the Housewarming Gifts I Love Having in My Home

Published

on

Here Are the Housewarming Gifts I Love Having in My Home



Make a house into a home with these gifts, whether your recipient is moving into their first house or a great new apartment.



Source link

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Tech

African languages for AI: The project that’s gathering a huge new dataset

Published

on

African languages for AI: The project that’s gathering a huge new dataset


Credit: Unsplash/CC0 Public Domain

Artificial intelligence (AI) tools like ChatGPT, DeepSeek, Siri or Google Assistant are developed by the global north and trained in English, Chinese or European languages. In comparison, African languages are largely missing from the internet.

A team of African computer scientists, linguists, language specialists and others have been working on precisely this problem for two years already. The African Next Voices project recently released what’s thought to be the largest dataset of African languages for AI so far. We asked them about their project, with sites in Kenya, Nigeria and South Africa.

Why is language so important to AI?

Language is how we interact, ask for help, and hold meaning in community. We use it to organize complex thoughts and share ideas. It’s the medium we use to tell an AI what we want—and to judge whether it understood us.

We are seeing an upsurge of applications that rely on AI, from education to health to agriculture. These models are trained from large volumes of (mostly) linguistic (language) data. These are called or LLMs but are found in only a few of the world’s languages.

Languages also carry culture, values and local wisdom. If AI doesn’t speak our languages, it can’t reliably understand our intent, and we can’t trust or verify its answers. In short: without language, AI can’t communicate with us—and we can’t communicate with it. Building AI in our languages is therefore the only way for AI to work for people.

If we limit whose language gets modeled, we risk missing out on the majority of human cultures, history and knowledge.

Why are African languages missing and what are the consequences for AI?

The development of language is intertwined with the histories of people. Many of those who experienced colonialism and empire have seen their own languages being marginalized and not developed to the same extent as colonial languages. African languages are not as often recorded, including on the internet.

So there isn’t enough high-quality, digitized text and speech to train and evaluate robust AI models. That scarcity is the result of decades of policy choices that privilege colonial languages in schools, media and government.

Language data is just one of the things that’s missing. Do we have dictionaries, terminologies, glossaries? Basic tools are few and many other issues raise the cost of building datasets. These include African language keyboards, fonts, spell-checkers, tokenizers (which break text into smaller pieces so a language model can understand it), orthographic variation (differences in how words are spelled across regions), tone marking and rich dialect diversity.

The result is AI that performs poorly and sometimes unsafely: mistranslations, poor transcription, and systems that barely understand African languages.

In practice this denies many Africans access—in their own languages—to global news, educational materials, health care information, and the productivity gains AI can deliver.

When a language isn’t in the data, its speakers aren’t in the product, and AI cannot be safe, useful or fair for them. They end up missing the necessary language technology tools that could support service delivery. This marginalizes millions of people and increases the technology divide.

What is your project doing about it—and how?

Our main objective is to collect speech data for automatic speech recognition (ASR). ASR is an important tool for languages that are largely spoken. This technology converts spoken language into written text.

The bigger ambition of our project is to explore how data for ASR is collected and how much of it is needed to create ASR tools. We aim to share our experiences across different geographic regions.

The data we collect is diverse by design: spontaneous and read speech; in various domains—everyday conversations, health care, financial inclusion and agriculture. We are collecting data from people of diverse ages, gender and educational backgrounds.

Every recording is collected with informed consent, fair compensation and clear data-rights terms. We transcribe with language-specific guidelines and a large range of other technical checks.

In Kenya, through Maseno Centre for Applied AI, we are collecting voice data for five languages. We’re capturing the three main language groups Nilotic (Dholuo, Maasai and Kalenjin) as well as Cushitic (Somali) and Bantu (Kikuyu).

Through Data Science Nigeria, we are collecting speech in five widely spoken languages—Bambara, Hausa, Igbo, Nigerian Pidgin and Yoruba. The dataset aims to accurately reflect authentic language use within these communities.

In South Africa, working through the Data Science for Social Impact lab and its collaborators, we have been recording seven South African languages. The aim is to reflect the country’s rich linguistic diversity: isiZulu, isiXhosa, Sesotho, Sepedi, Setswana, isiNdebele and Tshivenda.

Importantly, this work does not happen in isolation. We are building on the momentum and ideas from the Masakhane Research Foundation network, Lelapa AI, Mozilla Common Voice, EqualyzAI, and many other organizations and individuals who have been pioneering African language models, data and tooling.

Each project strengthens the others, and together they form a growing ecosystem committed to making African languages visible and usable in the age of AI.

How can this be put to use?

The data and models will be useful for captioning local-language media; voice assistants for agriculture and health; call-center and support in the languages. The data will also be archived for cultural preservation.

Larger, balanced, publicly available African language datasets will allow us to connect text and speech resources. Models will not just be experimental, but useful in chatbots, education tools and local service delivery. The opportunity is there to go beyond datasets into ecosystems of tools (spell-checkers, dictionaries, translation systems, summarization engines) that make African languages a living presence in digital spaces.

In short, we are pairing ethically collected, high-quality speech at scale with models. The aim is for people to be able to speak naturally, be understood accurately, and access AI in the languages they live their lives in.

What happens next for the project?

This project only collected voice data for certain languages. What of the remaining languages? What of other tools like machine translation or grammar checkers?

We will continue to work on multiple languages, ensuring that we build data and models that reflect how Africans use their languages. We prioritize building smaller language models that are both energy efficient and accurate for the African context.

The challenge now is integration: making these pieces work together so that African languages are not just represented in isolated demos, but in real-world platforms.

One of the lessons from this project, and others like it, is that collecting data is only step one. What matters is making sure that the data is benchmarked, reusable, and linked to communities of practice. For us, the “next” is to ensure that the ASR benchmarks we build can connect with other ongoing African efforts.

We also need to ensure sustainability: that students, researchers, and innovators have continued access to compute (computer resources and processing power), training materials and licensing frameworks (Like NOODL or Esethu). The long-term vision is to enable choice: so that a farmer, a teacher, or a local business can use AI in isiZulu, Hausa, or Kikuyu, not just in English or French.

If we succeed, built-in AI in African languages won’t just be catching up. It will be setting new standards for inclusive, responsible AI worldwide.

Provided by
The Conversation


This article is republished from The Conversation under a Creative Commons license. Read the original article.The Conversation

Citation:
African languages for AI: The project that’s gathering a huge new dataset (2025, October 19)
retrieved 19 October 2025
from https://techxplore.com/news/2025-10-african-languages-ai-huge-dataset.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.





Source link

Continue Reading

Tech

Jona Health’s Mail-Order Kit Helps You Decode Your Microbiome

Published

on

Jona Health’s Mail-Order Kit Helps You Decode Your Microbiome


Look, there’s nothing quite like starting your day by pooping on a little paper hammock affixed to your toilet seat and then poking it a bunch of times with a cotton swab. It was more of a mental hurdle than a practical one, though, as the collection and disposal (you just flush the hammock down when you’re done) was easy enough. You then swish the stick around in a solution, cap it, and send it off. Twenty days later, I got an email that my results were in.

On the website, your results are broken down into a few sections: Summary (with tabs for Brain Health, GI Health, Metabolic Health, Skin Health, and Physical Performance), Action Plan (with tabs for Highest Impact, Diet, Lifestyle, and Probiotics), and the Organisms page, which shows you every single organism it found in your sample, and their relative abundance. Mine held some surprises.

On the positive side, my Microbiome Diversity came in at 4.19, which is above average (normal range is 2.80–3.99, as measured by the Shannon Index), which it says is a sign of a healthy microbiome, and it didn’t find any pathogens or parasites. It says I digest lactose well (thank goodness). It didn’t find any associations for things like depression, celiac disease, IBS, ulcerative colitis, leaky gut, hypertension, eczema, or a bunch of other things that I’m thankful to not have. Some of these were actually a bit puzzling, frankly, as I’ve struggled with insomnia pretty much my entire life, but it didn’t find any associations there, or for fatigue, and I am most assuredly a tired human.

As far as associations that it did find, some were things I suspected, while others were total surprises. Under Brain Health, I had a moderate association for stress and a low association for ADHD, neither of which shocked me. Under Metabolic health was a “very low” association for prediabetes, which I actually thought would be higher, unfortunately. I had a moderate association with osteoarthritis, which made sense, given my family history.



Source link

Continue Reading

Tech

How to Protect Yourself Against Getting Locked Out of Your Cloud Accounts

Published

on

How to Protect Yourself Against Getting Locked Out of Your Cloud Accounts


If you’re sensitive to tech disasters, you might want to look away now: A recent Reddit thread tells the story of an unfortunate user who found 30 years of photos and work locked away and inaccessible in Microsoft OneDrive.

The individual made use of their cloud storage account to consolidate files from various hard drives, which had to be discarded due to a move. The plan was to then move the files back from OneDrive to new hard drives, but before the user was able to do this, their account was locked by Microsoft—without any reason given.

It’s still not clear why the account was locked or why Microsoft has so far ignored the user’s appeals to restore access, but it’s a warning to the rest of us—and a reminder to put a few basic protections and precautions in place.

Keep Multiple Backups

It used to be a truth universally acknowledged that data wasn’t properly backed up until it was backed up twice, in two separate locations. You can copy your important files to an external hard drive, but if it’s in the same room as your laptop, then theft, fire, or flood can wipe out both copies at the same time.

Today, having two backups of everything—so three copies in total—might seem excessive, as cloud storage services so rarely go down. We’ve all become used to the idea that the data we’ve logged with Microsoft, Google, Apple, or other providers is always going to be available, so we don’t need to worry about it.

Apps will often push you to delete local copies of your files.

Photograph: David Nield



Source link

Continue Reading

Trending