KULLM-R

Introducing KULLM-R, a large language model specialized for high-level reasoning queries in Korean, with a particular focus on complex mathematical problems. The model is designed to provide both the correct reasoning paths and answers for such queries, offering enhanced reasoning efficiency and language transferability to Korean compared to general-purpose reasoning models. Reinforcement learning strategy is employed for efficient reasoning path exploration and Korean-specific generation.

Model Details

Model Name: KULLM-R
Developer: Seungyoon Lee, Minhyuk Kim, Dongjun Kim, Gyuho Shim and Chanjun Park, supported by NLP&AI Lab in Korea University
Languages: Korean, English
Objective: Producing efficient and interpretable reasoning paths and answers for high-level Korean reasoning queries
Training Framework: verl, PyTorch, Transformers
Parameter Size: 8B

Model Description

KULLM-R is distinguished from standard reasoning LLMs based on Qwen3-8B by its focus on reinforcement learning-based reasoning path exploration and its strong proficiency in Korean language use. It is trained to generate efficient reasoning paths for both English and Korean problems and provides well-structured, readable answers in Korean, delivering strong interpretability and an outstanding user experience for Korean speakers.

Key Features

Reasoning Efficiency Aware Reinforcement Learning: Introduces RL techniques considering both reasoning path efficiency and answer correctness, reducing unnecessary steps while maintaining answer quality.
Reasoning Path Pruning: Specialized for high-difficulty reasoning problems by pruning ineffective paths and emphasizing transparency and readability in generated answers.
Support High Readability in Korean System: Enhanced both logical reasoning and natural Korean expression ability in answer.
Adaptive Length Penalty: Adaptive penalties optimize the reasoning process according to the question’s complexity and difficulty, ensuring efficient solutions for various math problems.

Data & Training Process

Data Sources: ko-limo (only 817 rows)
Training Strategy: Uses reasoning problem difficulty-aware adaptive reward systems, implementing reinforcement learning with dynamic length penalty for optimal performance.
Iteration: The model repeatedly trains on high-difficulty examples to optimize reasoning path generation.

Quickstart

The code of Qwen3 has been in the latest Hugging Face transformers and we advise you to use the latest version of transformers.

With transformers<4.51.0, you will encounter the following error:

KeyError: 'qwen3'

The following contains a code snippet illustrating how to use the model generate content based on given inputs.

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "nobrand/KULLM-R"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# prepare the model input
system_prompt = "You are a helpful assistant.\nPlease reason step by step, and put your final answer within \\boxed{{}}" # Recommend to use given system prompt
user_promt = "1부터 1008까지의 자연수 중 1008과 서로소인 자연수의 갯수를 구하시오."
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_promt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=16384
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

# parsing thinking content
try:
    # rindex finding 151668 (</think>)
    index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
    index = 0

thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

print("thinking content:", thinking_content)
print("content:", content)

As mentioned in Qwen3, use Temperature=0.6, TopP=0.95, TopK=20, and MinP=0 (the default setting in generation_config.json). DO NOT use greedy decoding, as it can lead to performance degradation and endless repetitions.

Evaluation

Shows superior reasoning efficiency, shorter reasoning steps, higher readability in Korean, and better explanation quality compared to models of similar scale when evaluated on HRM-8K.

Task Score Think Step Length Korean Response Ratio

GSM8k 91.9 896 94.47

KSM 70.9 7979 80.6

MATH 95.1 2668 96.12

OMNI Math 61.9 7987 73.91

Task	Score	Think Step Length	Korean Response Ratio
GSM8k	91.9	896	94.47
KSM	70.9	7979	80.6
MATH	95.1	2668	96.12
OMNI Math	61.9	7987	73.91

Intended Use

Solving complex Korean mathematical and logical reasoning problems
Improved explainability for Korean logical reasoning
Tutoring and educational support in reasoning fields

Citation

@misc{KULLM-R2025,
  title   = {KULLM-R: Korea University Large Language Model for Reasoning},
  author  = {Korea University NLP&AI Lab},
  year    = {2025},
}

nobrand
/

KULLM-R