At its Speech AI Summit, NVIDIA introduced a speech artificial intelligence ecosystem built with Mozilla Common Voice to develop automatic speech recognition models that work across languages worldwide.
The new ecosystem focuses on developing crowdsourced multilingual speech corpuses and open-source pre-trained models and expanding the speech data available for low-resource languages.
According to NVIDIA, the initiative will focus on helping AI models understand speaker and language diversity, different accents, and different noise profiles. Developers will be able to train their models on Mozilla Common Voice datasets and then offer those pre-trained models as open-sourced automatic speech recognition architectures.
The Mozilla Common Voice platform currently supports 100 languages, including six new languages (Taiwanese, Bengali, Cantonese, Tigre (Eritrean), Meadow Mari, and Toki Pona, and includes more than 24,000 hours of speech data available from 500,000 contributors worldwide, including a growing number of female speakers.
Through the Mozilla Common Voice platform, users donate their audio datasets by recording sentences as short voice clips, which Mozilla validates to ensure dataset quality upon submission.