--- title: README emoji: 🏢 colorFrom: blue colorTo: blue sdk: static pinned: false --- # ZA-African-Next-Voices (za-african-next-voices) **Purpose:** This organization was created to manage and share datasets and models for the South African component of the African Next Voices (ANV) project. If you’re looking for all datasets, models, or work from our broader team, visit our main org: [DSFSI](https://huggingface.co/dsfsi). ## About the South African Next Voices Project **ZA-ANV** is building a **3,000-hour** multilingual, multi-domain speech dataset for South Africa, spanning seven local languages. - **Languages:** Setswana, isiZulu, isiXhosa, Sesotho, Sepedi, isiNdebele, Tshivenda - **Coverage:** 500 hours per language for the main five; 250 hours for isiNdebele and Tshivenda (pilot/experimental scale for future work) - **Domains:** Broad/general domains to reflect real-world diversity - **Goal:** Enable robust speech and language technology for local South African languages, break literacy barriers, and make digital content locally relevant. # About DSFSI **Data Science for Social Impact (DSFSI)** is a research group at the Computer Science Department, University of Pretoria. We work at the intersection of **Data Science for Society** and **Local Language NLP**. Our mission: *Data-driven collaborative innovation to empower society to tackle challenges and preserve our languages.* Find all our work and resources at: [huggingface.co/dsfsi](https://huggingface.co/dsfsi) **Questions?** Contact us via our [DSFSI website](https://www.dsfsi.co.za) or through the main [DSFSI Hugging Face org](https://huggingface.co/dsfsi).