
Pluto , an artificial intelligence data and solutions company, announced on the 10th that it has launched a new project to collect high-quality Arabic speech data to improve the multilingual recognition rate of AI models.
This project aimed to improve the performance of Arabic, a language that has shown relatively low recognition rates in speech-to-text (STT) models. In addition to the standard language, MSA, Arabic has over 30 dialects. Code-switching, where standard and dialects are frequently used in everyday conversation, makes it a language considered difficult to build AI training data for.
Pluto is running an Arabic speech data collection event utilizing the "Arcade" voice data collection feature built into its mobile application. Participants read presented sentences and record their voices, and the AI system analyzes the speech data to determine the dialect type. If the dialect is unclear, additional sentences are presented to encourage re-participation, thereby improving data accuracy.
The company explained that it pursued this project to proactively respond to potential demand as well as actual project requests, as demand for multilingual voice data continues to grow, especially among global big tech companies.
Pluto believes that this data collection will enable the creation of training data that reflects linguistic diversity, including speakers' intonation, pronunciation patterns, and vocabulary choices. Based on this, Pluto plans to mitigate AI learning biases caused by linguistic resource variations and develop a dataset capable of achieving high recognition rates in real-world environments.
Lee Jeong-su, CEO of Pluto, said, “Arabic is a major language used by over 400 million people around the world, but it is a low-resource language with relatively insufficient data for AI training.” He added, “Through this project, we will contribute to improving the quality of Arabic language recognition in global AI models by building data that faithfully reflects the actual usage context of Arabic.”
- See more related articles
You must be logged in to post a comment.