naver-hyperclovax
/

HyperCLOVAX-SEED-Think-14B

@@ -9,7 +9,7 @@ library_name: transformers
 ## Overview
-HyperCLOVA X SEED Think 14B is a next-generation language model that moves beyond the conventional approach of simply increasing model size to improve performance. It combines [HyperCLOVA X’s lightweighting technology](https://tinyurl.com/y3hrfz67) for building high-efficiency LLMs with advanced reasoning capabilities. Its development relied on two key technologies: (1) Pruning & Knowledge Distillation, which achieves both compactness and high performance, and (2) a Reinforcement Learning (RL) pipeline, which maximizes reasoning ability. By pruning low-importance parameters and distilling knowledge from a large model into a smaller one, training costs have been significantly reduced. On top of this, [the latest RL recipe validated in HyperCLOVA X Think](https://arxiv.org/pdf/2506.22403) is applied in a multi-stage process: (1) Supervised Fine-Tuning (SFT), (2) Reinforcement Learning with Verifiable Rewards (RLVR), (3) Length Controllability (LC) for reasoning path optimization, and (4) a joint training of Reinforcement Learning from Human Feedback (RLHF) and RLVR.
 It is a considerable challenge to equip a pruned, knowledge-distilled model with reasoning capabilities, since reductions in training costs and model size often degrade reasoning performance. However, through extensive research experience and persistent trial and error, the HyperCLOVA X team has succeeded in lowering training costs while maintaining reasoning performance comparable to that of larger, resource-intensive models.
@@ -23,11 +23,11 @@ It is a considerable challenge to equip a pruned, knowledge-distilled model with
 ## Training Cost
-`HyperCLOVA X SEED Think 14B` was trained at a significantly lower cost compared to high-performance external models of similar scale. By utilizing HCX’s lightweight training pipeline, it was trained at approximately **52.60×** lower cost than `Qwen2.5-14B` and **91.38×** lower cost than `Qwen3-14B`.
 | Model (Base)                    | GPU Hours (A100-80GB, MFU 50%)     |
 | ------------------------------- | ---------------------------------- |
-| **HyperCLOVA X SEED Think 14B** | **68,049**                         |
 | Qwen2.5-0.5B                    | 169,257                            |
 | Qwen2.5-1.5B                    | 449,643                            |
 | Qwen3-0.6B                      | 602,460                            |
@@ -40,7 +40,7 @@ It is a considerable challenge to equip a pruned, knowledge-distilled model with
 ## Benchmarks
-Compared to global models of a similar scale, such as Qwen3 14B, HyperCLOVA X SEED Think 14B demonstrates superior performance in Korean language and cultural understanding, while showing competitive performance in math and coding tasks, which are directly or indirectly related to agent capabilities. This trend remains consistent even when compared with larger models like Qwen3 32B and LG Exaone-Deep 32B.
 ### Backbone Benchmarks Performance Comparison (Non-think)
@@ -48,7 +48,7 @@ Compared to global models of a similar scale, such as Qwen3 14B, HyperCLOVA X SE
 | Model                           | Average | CLIcK  | HAERAE-Bench | KOBEST | KorMedMCQA | KMMLU  | KoBigBench | KoCommonGEN-v2 |
 | ------------------------------- | ------- | ------ | ------------ | ------ | ---------- | ------ | ---------- | -------------- |
-| **HyperCLOVA X SEED Think 14B** | 0.7269  | 0.7208 | 0.8506       | 0.8570 | 0.6411     | 0.5428 | 0.7482     | 0.6682         |
 | QWEN3-8B                        | 0.6759  | 0.6206 | 0.6618       | 0.7919 | 0.6471     | 0.5543 | 0.7186     | 0.5773         |
 | QWEN3-14B                       | 0.7079  | 0.6707 | 0.6975       | 0.8174 | 0.6979     | 0.5864 | 0.7507     | 0.5927         |
@@ -56,7 +56,7 @@ Compared to global models of a similar scale, such as Qwen3 14B, HyperCLOVA X SE
 **English/American Culture**
 | Model                           | Average | MMLU   | BigBench-Hard | Hellaswag | Winogrande | PIQA   | ARC-challenge | Social IQa |
 | ------------------------------- | ------- | ------ | ------------- | --------- | ---------- | ------ | ------------- | ---------- |
-| **HyperCLOVA X SEED Think 14B** | 0.6614  | 0.7121 | 0.6216        | 0.6125    | 0.7593     | 0.7791 | 0.6246        | 0.5205     |
 | QWEN3-8B                        | 0.6548  | 0.7490 | 0.6072        | 0.5817    | 0.7198     | 0.7666 | 0.6433        | 0.5159     |
 | QWEN3-14B                       | 0.6807  | 0.7885 | 0.6325        | 0.6143    | 0.7356     | 0.8025 | 0.6698        | 0.5215     |
@@ -66,14 +66,14 @@ Compared to global models of a similar scale, such as Qwen3 14B, HyperCLOVA X SE
 **Korean/Korea Culture**
 | Model | KMMLU | CSAT-ko-2025 | KorMedMCQA | KoBALT | HAERAE | CLIcK | KoBigBench | LogicKor |
 |-----------------------------------------|--------|--------|--------|--------|--------|--------|--------|------|
-| HyperCLOVA X SEED Think 14B **(Think)** | 0.6649 | 0.7516 | 0.6933 | 0.4500 | 0.8537 | 0.7280 | 0.7974 | 8.74 |
 | QWEN3-8B                                | 0.5543 | 0.7200 | 0.6782 | 0.3060 | 0.6618 | 0.6690 | 0.7850 | 8.92 |
 | QWEN3-14B                               | 0.4930 | 0.7710 | 0.6850 | 0.3840 | 0.7410 | 0.6880 | 0.8380 | 9.15 |
 **Coding/Math**
 | Model | GSM8k | MATH500 | HumanEval | MBPP |
 |-----------------------------------------|--------|--------|--------|--------|
-| HyperCLOVA X SEED Think 14B | 0.9553 | 0.9380 | 0.9451 | 0.8759 |
 | QWEN3-14B                   | 0.9590 | 0.9680 | 0.9570 | 0.9080 |
@@ -81,8 +81,8 @@ Compared to global models of a similar scale, such as Qwen3 14B, HyperCLOVA X SE
 | Model | GSM8k | GPT4Eval | MT Bench | Arena-Hard-v0.1 |
 |---------------------------------------------|--------|--------|--------|--------|
-| HyperCLOVA X SEED Think 14B **(Non-think)** | 0.9348 | 0.6741 | 8.2063 | 0.2733 |
-| HyperCLOVA X SEED Think 14B **(Think)**     | 0.9553 | 0.8200 | 8.8313 | 0.5826 |
 ## ChatML Block

 ## Overview
+HyperCLOVA X SEED 14B Think is a next-generation language model that moves beyond the conventional approach of simply increasing model size to improve performance. It combines [HyperCLOVA X’s lightweighting technology](https://tinyurl.com/y3hrfz67) for building high-efficiency LLMs with advanced reasoning capabilities. Its development relied on two key technologies: (1) Pruning & Knowledge Distillation, which achieves both compactness and high performance, and (2) a Reinforcement Learning (RL) pipeline, which maximizes reasoning ability. By pruning low-importance parameters and distilling knowledge from a large model into a smaller one, training costs have been significantly reduced. On top of this, [the latest RL recipe validated in HyperCLOVA X Think](https://arxiv.org/pdf/2506.22403) is applied in a multi-stage process: (1) Supervised Fine-Tuning (SFT), (2) Reinforcement Learning with Verifiable Rewards (RLVR), (3) Length Controllability (LC) for reasoning path optimization, and (4) a joint training of Reinforcement Learning from Human Feedback (RLHF) and RLVR.
 It is a considerable challenge to equip a pruned, knowledge-distilled model with reasoning capabilities, since reductions in training costs and model size often degrade reasoning performance. However, through extensive research experience and persistent trial and error, the HyperCLOVA X team has succeeded in lowering training costs while maintaining reasoning performance comparable to that of larger, resource-intensive models.
 ## Training Cost
+`HyperCLOVA X SEED 14B Think` was trained at a significantly lower cost compared to high-performance external models of similar scale. By utilizing HCX’s lightweight training pipeline, it was trained at approximately **52.60×** lower cost than `Qwen2.5-14B` and **91.38×** lower cost than `Qwen3-14B`.
 | Model (Base)                    | GPU Hours (A100-80GB, MFU 50%)     |
 | ------------------------------- | ---------------------------------- |
+| **HyperCLOVA X SEED 14B Think** | **68,049**                         |
 | Qwen2.5-0.5B                    | 169,257                            |
 | Qwen2.5-1.5B                    | 449,643                            |
 | Qwen3-0.6B                      | 602,460                            |
 ## Benchmarks
+Compared to global models of a similar scale, such as Qwen3 14B, HyperCLOVA X SEED 14B Think demonstrates superior performance in Korean language and cultural understanding, while showing competitive performance in math and coding tasks, which are directly or indirectly related to agent capabilities. This trend remains consistent even when compared with larger models like Qwen3 32B and LG Exaone-Deep 32B.
 ### Backbone Benchmarks Performance Comparison (Non-think)
 | Model                           | Average | CLIcK  | HAERAE-Bench | KOBEST | KorMedMCQA | KMMLU  | KoBigBench | KoCommonGEN-v2 |
 | ------------------------------- | ------- | ------ | ------------ | ------ | ---------- | ------ | ---------- | -------------- |
+| **HyperCLOVA X SEED 14B Think** | 0.7269  | 0.7208 | 0.8506       | 0.8570 | 0.6411     | 0.5428 | 0.7482     | 0.6682         |
 | QWEN3-8B                        | 0.6759  | 0.6206 | 0.6618       | 0.7919 | 0.6471     | 0.5543 | 0.7186     | 0.5773         |
 | QWEN3-14B                       | 0.7079  | 0.6707 | 0.6975       | 0.8174 | 0.6979     | 0.5864 | 0.7507     | 0.5927         |
 **English/American Culture**
 | Model                           | Average | MMLU   | BigBench-Hard | Hellaswag | Winogrande | PIQA   | ARC-challenge | Social IQa |
 | ------------------------------- | ------- | ------ | ------------- | --------- | ---------- | ------ | ------------- | ---------- |
+| **HyperCLOVA X SEED 14B Think** | 0.6614  | 0.7121 | 0.6216        | 0.6125    | 0.7593     | 0.7791 | 0.6246        | 0.5205     |
 | QWEN3-8B                        | 0.6548  | 0.7490 | 0.6072        | 0.5817    | 0.7198     | 0.7666 | 0.6433        | 0.5159     |
 | QWEN3-14B                       | 0.6807  | 0.7885 | 0.6325        | 0.6143    | 0.7356     | 0.8025 | 0.6698        | 0.5215     |
 **Korean/Korea Culture**
 | Model | KMMLU | CSAT-ko-2025 | KorMedMCQA | KoBALT | HAERAE | CLIcK | KoBigBench | LogicKor |
 |-----------------------------------------|--------|--------|--------|--------|--------|--------|--------|------|
+| HyperCLOVA X SEED 14B Think **(Think)** | 0.6649 | 0.7516 | 0.6933 | 0.4500 | 0.8537 | 0.7280 | 0.7974 | 8.74 |
 | QWEN3-8B                                | 0.5543 | 0.7200 | 0.6782 | 0.3060 | 0.6618 | 0.6690 | 0.7850 | 8.92 |
 | QWEN3-14B                               | 0.4930 | 0.7710 | 0.6850 | 0.3840 | 0.7410 | 0.6880 | 0.8380 | 9.15 |
 **Coding/Math**
 | Model | GSM8k | MATH500 | HumanEval | MBPP |
 |-----------------------------------------|--------|--------|--------|--------|
+| HyperCLOVA X SEED 14B Think | 0.9553 | 0.9380 | 0.9451 | 0.8759 |
 | QWEN3-14B                   | 0.9590 | 0.9680 | 0.9570 | 0.9080 |
 | Model | GSM8k | GPT4Eval | MT Bench | Arena-Hard-v0.1 |
 |---------------------------------------------|--------|--------|--------|--------|
+| HyperCLOVA X SEED 14B Think **(Non-think)** | 0.9348 | 0.6741 | 8.2063 | 0.2733 |
+| HyperCLOVA X SEED 14B Think **(Think)**     | 0.9553 | 0.8200 | 8.8313 | 0.5826 |
 ## ChatML Block