KULLM-R
Introducing KULLM-R, a large language model specialized for high-level reasoning queries in Korean, with a particular focus on complex mathematical problems. The model is designed to provide both the correct reasoning paths and answers for such queries, offering enhanced reasoning efficiency and language transferability to Korean compared to general-purpose reasoning models. Reinforcement learning strategy is employed for efficient reasoning path exploration and Korean-specific generation.
Model Details
- Model Name: KULLM-R
- Developer: Seungyoon Lee, Minhyuk Kim, Dongjun Kim, Gyuho Shim and Chanjun Park, supported by NLP&AI Lab in Korea University
- Languages: Korean, English
- Objective: Producing efficient and interpretable reasoning paths and answers for high-level Korean reasoning queries
- Training Framework: verl, PyTorch, Transformers
- Parameter Size: 8B
Model Description
KULLM-R is distinguished from standard reasoning LLMs based on Qwen3-8B by its focus on reinforcement learning-based reasoning path exploration and its strong proficiency in Korean language use. It is trained to generate efficient reasoning paths for both English and Korean problems and provides well-structured, readable answers in Korean, delivering strong interpretability and an outstanding user experience for Korean speakers.
Key Features
- Reasoning Efficiency Aware Reinforcement Learning: Introduces RL techniques considering both reasoning path efficiency and answer correctness, reducing unnecessary steps while maintaining answer quality.
- Reasoning Path Pruning: Specialized for high-difficulty reasoning problems by pruning ineffective paths and emphasizing transparency and readability in generated answers.
- Support High Readability in Korean System: Enhanced both logical reasoning and natural Korean expression ability in answer.
- Adaptive Length Penalty: Adaptive penalties optimize the reasoning process according to the questionโs complexity and difficulty, ensuring efficient solutions for various math problems.
Data & Training Process
- Data Sources: ko-limo (only 817 rows)
- Training Strategy: Uses reasoning problem difficulty-aware adaptive reward systems, implementing reinforcement learning with dynamic length penalty for optimal performance.
- Iteration: The model repeatedly trains on high-difficulty examples to optimize reasoning path generation.
Quickstart
The code of Qwen3 has been in the latest Hugging Face transformers
and we advise you to use the latest version of transformers
.
With transformers<4.51.0
, you will encounter the following error:
KeyError: 'qwen3'
The following contains a code snippet illustrating how to use the model generate content based on given inputs.
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "nobrand/KULLM-R"
# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
# prepare the model input
system_prompt = "You are a helpful assistant.\nPlease reason step by step, and put your final answer within \\boxed{{}}" # Recommend to use given system prompt
user_promt = "1๋ถํฐ 1008๊น์ง์ ์์ฐ์ ์ค 1008๊ณผ ์๋ก์์ธ ์์ฐ์์ ๊ฐฏ์๋ฅผ ๊ตฌํ์์ค."
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_promt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# conduct text completion
generated_ids = model.generate(
**model_inputs,
max_new_tokens=16384
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
# parsing thinking content
try:
# rindex finding 151668 (</think>)
index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
index = 0
thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")
print("thinking content:", thinking_content)
print("content:", content)
As mentioned in Qwen3, use
Temperature=0.6
,TopP=0.95
,TopK=20
, andMinP=0
(the default setting ingeneration_config.json
). DO NOT use greedy decoding, as it can lead to performance degradation and endless repetitions.
Evaluation
- Shows superior reasoning efficiency, shorter reasoning steps, higher readability in Korean, and better explanation quality compared to models of similar scale when evaluated on HRM-8K.
Task Score Think Step Length Korean Response Ratio GSM8k 91.9 896 94.47 KSM 70.9 7979 80.6 MATH 95.1 2668 96.12 OMNI Math 61.9 7987 73.91

Intended Use
- Solving complex Korean mathematical and logical reasoning problems
- Improved explainability for Korean logical reasoning
- Tutoring and educational support in reasoning fields
Citation
@misc{KULLM-R2025,
title = {KULLM-R: Korea University Large Language Model for Reasoning},
author = {Korea University NLP&AI Lab},
year = {2025},
}
- Downloads last month
- 74