
Pluto announced on the 19th that it has participated in the 'Korean-Foreign Language Parallel Corpus Construction Project' hosted by the National Institute of the Korean Language for the fifth consecutive year. This project aims to build high-quality language data for the development of artificial intelligence (AI) technology, and focuses on protecting the data sovereignty of Korean language and culture and supporting the development of Korean-style AI technology.
Pluto has been carrying out this project for five consecutive years from 2021 to the present, recording a total of 11.5 billion won in orders and 55 million phrases built. This year's project is being carried out in cooperation with the Kyunghee University Industry-Academic Cooperation Foundation, and out of the total 4.2 billion won project, Pluto was in charge of building a parallel corpus worth 2.09 billion won. In this project, a parallel corpus of a total of 9 million phrases will be built for nine languages: Vietnamese, Indonesian, Thai, Hindi, Khmer, Tagalog, Russian, Uzbek, and English.
Through five years of business participation, Pluto has built language data essential for the development of AI-based translation software and natural language processing (NLP) technology. The built data will be used for technology development for the government-led development of language and culture industries, and in particular, by datafication of low-resource languages in the ASEAN-India and Eurasian regions, it is expected to resolve existing data imbalances and promote language and culture exchanges between countries.
Lee Jeong-su, CEO of Pluto, said, “It is meaningful that Pluto’s language data construction experience and expertise have been recognized and that we have been able to carry out the project for five consecutive years,” and added, “We will continue to supply high-quality language data and contribute to strengthening the global competitiveness of Korean artificial intelligence technology.”
The language data built through this project can be checked in the 'Everyone's Corpus', an integrated language information sharing system operated by the National Institute of the Korean Language, and is expected to be used as an important resource for research and technology development. It is also expected to be an important support material for domestic companies planning to advance into countries that use low-resource languages.
- See more related articles
You must be logged in to post a comment.