KULLM-R

Introducing KULLM-R, a large language model specialized for high-level reasoning queries in Korean, with a particular focus on complex mathematical problems. The model is designed to provide both the correct reasoning paths and answers for such queries, offering enhanced reasoning efficiency and language transferability to Korean compared to general-purpose reasoning models. Reinforcement learning strategy is employed for efficient reasoning path exploration and Korean-specific generation.

Model Details

  • Model Name: KULLM-R
  • Developer: Seungyoon Lee, Minhyuk Kim, Dongjun Kim, Gyuho Shim and Chanjun Park, supported by NLP&AI Lab in Korea University
  • Languages: Korean, English
  • Objective: Producing efficient and interpretable reasoning paths and answers for high-level Korean reasoning queries
  • Training Framework: verl, PyTorch, Transformers
  • Parameter Size: 8B

Model Description

KULLM-R is distinguished from standard reasoning LLMs based on Qwen3-8B by its focus on reinforcement learning-based reasoning path exploration and its strong proficiency in Korean language use. It is trained to generate efficient reasoning paths for both English and Korean problems and provides well-structured, readable answers in Korean, delivering strong interpretability and an outstanding user experience for Korean speakers.

Key Features

  • Reasoning Efficiency Aware Reinforcement Learning: Introduces RL techniques considering both reasoning path efficiency and answer correctness, reducing unnecessary steps while maintaining answer quality.
  • Reasoning Path Pruning: Specialized for high-difficulty reasoning problems by pruning ineffective paths and emphasizing transparency and readability in generated answers.
  • Support High Readability in Korean System: Enhanced both logical reasoning and natural Korean expression ability in answer.
  • Adaptive Length Penalty: Adaptive penalties optimize the reasoning process according to the questionโ€™s complexity and difficulty, ensuring efficient solutions for various math problems.

Data & Training Process

  • Data Sources: ko-limo (only 817 rows)
  • Training Strategy: Uses reasoning problem difficulty-aware adaptive reward systems, implementing reinforcement learning with dynamic length penalty for optimal performance.
  • Iteration: The model repeatedly trains on high-difficulty examples to optimize reasoning path generation.

Quickstart

The code of Qwen3 has been in the latest Hugging Face transformers and we advise you to use the latest version of transformers.

With transformers<4.51.0, you will encounter the following error:

KeyError: 'qwen3'

The following contains a code snippet illustrating how to use the model generate content based on given inputs.

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "nobrand/KULLM-R"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# prepare the model input
system_prompt = "You are a helpful assistant.\nPlease reason step by step, and put your final answer within \\boxed{{}}" # Recommend to use given system prompt
user_promt = "1๋ถ€ํ„ฐ 1008๊นŒ์ง€์˜ ์ž์—ฐ์ˆ˜ ์ค‘ 1008๊ณผ ์„œ๋กœ์†Œ์ธ ์ž์—ฐ์ˆ˜์˜ ๊ฐฏ์ˆ˜๋ฅผ ๊ตฌํ•˜์‹œ์˜ค."
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_promt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=16384
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

# parsing thinking content
try:
    # rindex finding 151668 (</think>)
    index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
    index = 0

thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

print("thinking content:", thinking_content)
print("content:", content)

As mentioned in Qwen3, use Temperature=0.6, TopP=0.95, TopK=20, and MinP=0 (the default setting in generation_config.json). DO NOT use greedy decoding, as it can lead to performance degradation and endless repetitions.

Evaluation

  • Shows superior reasoning efficiency, shorter reasoning steps, higher readability in Korean, and better explanation quality compared to models of similar scale when evaluated on HRM-8K.
    Task Score Think Step Length Korean Response Ratio
    GSM8k 91.9 896 94.47
    KSM 70.9 7979 80.6
    MATH 95.1 2668 96.12
    OMNI Math 61.9 7987 73.91

Intended Use

  • Solving complex Korean mathematical and logical reasoning problems
  • Improved explainability for Korean logical reasoning
  • Tutoring and educational support in reasoning fields

Citation

@misc{KULLM-R2025,
  title   = {KULLM-R: Korea University Large Language Model for Reasoning},
  author  = {Korea University NLP&AI Lab},
  year    = {2025},
}
Downloads last month
74
Safetensors
Model size
8.19B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for nobrand/KULLM-R

Base model

Qwen/Qwen3-8B-Base
Finetuned
Qwen/Qwen3-8B
Finetuned
(203)
this model
Quantizations
2 models