Trillion Labs Opens Source for LLM 'Tri-21B' Based on Pre-Learning from the Beginning

-Reduced learning costs by 1/12 through the unique 'From-Scratch' pre-learning method and 'X-Language Cross-Learning System (XLDA)', implementing the most ideal structure between cost and performance

– Demonstrated performance comparable to global models in high-difficulty reasoning benchmarks such as mathematics and coding, and demonstrated outstanding capabilities, especially in the Korean language comprehension area.

– We will develop large-scale language models using our own technology, expand the application of AI across industries with a full-size LLM portfolio, and secure technological leadership.

Trillion Labs (CEO Jae-min Shin) has released as open source the next-generation large-scale language model (LLM) 'Tri-21B', which is designed to go beyond simple text generation and simultaneously perform high-dimensional language understanding and complex problem solving.

Tri-21B is a large-scale language model designed to go beyond simple text generation capabilities and simultaneously perform high-dimensional language understanding and complex problem solving. Compared to its predecessor, Trillion-7B, it has significantly improved performance by expanding the number of parameters by more than three times to approximately 21 billion, and it is lightweight and efficient enough to run smoothly on a single GPU.

This model, developed using the LLM engine and a complete from-scratch method using Trillion Labs’ unique technology, is designed to demonstrate powerful performance in tasks requiring high-precision inference. It adopts the Chain of Thought (CoT) structure that generates structured answers to problems requiring step-by-step thinking such as mathematics and coding, and in particular, applied Cross-lingual Document Attention (XLDA), a technology unique to Trillion Labs.

XLDA is a data learning methodology that effectively transfers English-based knowledge to low-resource languages such as Korean and Japanese, and has achieved an innovation that drastically reduces learning costs to 1/12 of the existing level. This is significant in that it has laid the foundation for dramatically increasing the utilization of LLM in industries with insufficient data. In addition, XLDA enables more natural and accurate sentence generation not only in Korean but also in Northeast Asian languages such as Japanese, which have limited data.

Trillion-21B demonstrated performance comparable to that of global representative mid-sized models such as Alibaba's Qwen 3, Meta LLaMA 3, and Google's Gemma 3 in difficult reasoning-oriented benchmarks such as General Knowledge (MMLU), Korean Language Understanding (KMMLU), Mathematics (MATH), and Coding (MBPP Plus). In particular, it demonstrated strength in actual problem-solving capabilities, recording an accuracy of 77.93 (85 when CoT is applied) in reasoning ability verification (MMLU), 77.89 in mathematics (MATH), and 75.4 in the coding area (MBPP Plus).

It also stood out in major Korean benchmarks. It scored 86.62 points in Hae-Rae, which measures understanding of Korean culture, and 62 points (70 when CoT is applied) in Korean language knowledge and reasoning ability (KMMLU), which is significantly higher than the global model, showing unrivaled Korean comprehension ability in vocabulary, contextual understanding, and reflection of cultural context. It also produced stable results in fields that require high reliability such as finance, medicine, and law, increasing the possibility of application across industries.

Shin Jae-min, CEO of Trillion Labs, said, “The Tri-21B effectively transfers the performance of a large 70B model to the 21B through its flywheel structure, and has implemented the most ideal structure to date in terms of balance between model size, cost, and performance.” He added, “With this model, we will quickly achieve cost efficiency and performance improvements through high-performance LLMs developed from the ground up through pre-learning, thereby increasing the perfection of Korean AI technology, and together with the Tri-70B, which will be released in the future, we will complete our full-size LLM portfolio.”

Meanwhile, Trillion Labs, established in August 2024, is the only startup in Korea that has independently designed a Korean-centered LLM and conducted pre-learning (from-scratch). It is a team comprised of top-notch AI engineers and researchers from Korea and abroad, including Jae-min Shin, a pioneer in the field of generative AI, and from KAIST, Oxford, Berkeley, Amazon, and Naver. In September 2024, it attracted $5.8 million (approximately 9 billion won) in pre-seed investment, and in March 2025, it released the pre-release model Trillion-7B (Trillion-7B-preview) as open source.

CEO Shin Jae-min has been a leading researcher in the field of empathic dialogue systems since 2017, and has been a key researcher in the pre-learning of Naver HyperClova X (7B~60B model). The 'Prometheus' paper series, written as the main author, won the 2025 Best Paper Award from the North American Association for Computational Linguistics (NAACL). In addition, he was recognized for his achievements in developing a Korean-style LLM and presented 'Sovereign AI' as an invited lecturer at 'NVIDIA GTC 2025', contributing to strengthening Korea's international status as a representative AI company.