The company has released an open -source ‘MASSIVE’ Speech Dataset for researchers and developers to tinker
Amazon.Inc has announced the release of its open-source what they have entitled the ‘MASSIVE’ speech dataset. The aim is to help researchers scale natural-language-understanding technology to every language.
The dataset will assist researchers in developing virtual assistants that could easily be generalized even for the world’s most hidden languages. In addition to the database, Amazon has also published open-source modeling code to help developers create more capable virtual assistants.
The Multilingual Amazon SLURP for Slot Filling, Intent Classification, and Virtual-assistant Evaluation, or MASSIVE for short, is a ‘parallel dataset’ that includes one million labeled utterances in 51 languages, including those that lack properly labeled data, as well as open-source code that demonstrates how to execute massively multilingual NLU modeling. With Alexa currently being available in 7 languages, the company aims to expand it to over 7000 languages spoken in the masked corners of the world.
Professional translators meticulously curated the dataset by translating the available English-only SLURP dataset into 50 varied languages that lacked labeled data. The MASSIVE database, according to Amazon, will be especially effective in improving spoken-language understanding, in which audio is transformed into text before NLU is done. Natural language understanding (NLU) is a branch of natural language processing (NLP) that deals with converting human language into a machine-readable format.
Several new technological breakthroughs in speech recognition and natural language understanding (NLU) have opened the way for voice-activated digital assistants such as Siri, Bixby, and Google Assistant. The primary shortcoming of these voice-controlled personal assistants is that they are only available in a few familiar languages. The MASSIVE dataset is one step forward in the creation of a dataset that spans several obscure languages to build multilingual natural-language-understanding models that can smoothly adapt to those languages whose training data is scarce, intending to allow people all over the world to enjoy the availability of conversational AI systems like Alexa in their native languages.
Amazon is also establishing a new competition called Massively Multilingual NLU 2022 (MMNLU-22) that will use the MASSIVE dataset to encourage academics to design models that can readily adapt to new languages and create more third-party apps for Alexa. The competition will be hosted on a platform called eval.ai and will include two tasks. During December, the competition’s outcomes will be presented at an EMNLP 2022 workshop in Abu Dhabi and an online session called Massively Multilingual NLU 2022. It will also feature presentations by guest speakers and oral and poster sessions with papers on multilingual natural-language processing that have been submitted.
Amazon has a vision for its products like Alexa and Echo to reach and be available to all customers and devices.
If you liked reading this, you might like our other stories
The Fast And Furious Trend Of qCommerce
Quick Commerce: The Supermarket Wars In Europe