--- library_name: transformers license: apache-2.0 pipeline_tag: text-generation base_model: - Qwen/Qwen3-8B language: - ko - en tags: - reasoning - LLMs - Korean --- # KULLM-R Introducing KULLM-R, a large language model specialized for high-level reasoning queries in Korean, with a particular focus on complex mathematical problems. The model is designed to provide both the correct reasoning paths and answers for such queries, offering enhanced reasoning efficiency and language transferability to Korean compared to general-purpose reasoning models. Reinforcement learning strategy is employed for efficient reasoning path exploration and Korean-specific generation. ## Model Details - **Model Name**: KULLM-R - **Developer**: Seungyoon Lee, Minhyuk Kim, Dongjun Kim, Gyuho Shim and Chanjun Park, supported by [NLP&AI Lab in Korea University](https://nlp.korea.ac.kr/) - **Languages**: Korean, English - **Objective**: Producing efficient and interpretable reasoning paths and answers for high-level Korean reasoning queries - **Training Framework**: verl, PyTorch, Transformers - **Parameter Size**: 8B ### Model Description KULLM-R is distinguished from standard reasoning LLMs based on Qwen3-8B by its focus on reinforcement learning-based reasoning path exploration and its strong proficiency in Korean language use. It is trained to generate efficient reasoning paths for both English and Korean problems and provides well-structured, readable answers in Korean, delivering strong interpretability and an outstanding user experience for Korean speakers. ### Key Features - **Reasoning Efficiency Aware Reinforcement Learning**: Introduces RL techniques considering both reasoning path efficiency and answer correctness, reducing unnecessary steps while maintaining answer quality. - **Reasoning Path Pruning**: Specialized for high-difficulty reasoning problems by pruning ineffective paths and emphasizing transparency and readability in generated answers. - **Support High Readability in Korean System**: Enhanced both logical reasoning and natural Korean expression ability in answer. - **Adaptive Length Penalty**: Adaptive penalties optimize the reasoning process according to the question’s complexity and difficulty, ensuring efficient solutions for various math problems. ## Data & Training Process - **Data Sources**: ko-limo (only 817 rows) - **Training Strategy**: Uses reasoning problem difficulty-aware adaptive reward systems, implementing reinforcement learning with dynamic length penalty for optimal performance. - **Iteration**: The model repeatedly trains on high-difficulty examples to optimize reasoning path generation. ## Quickstart The code of Qwen3 has been in the latest Hugging Face `transformers` and we advise you to use the latest version of `transformers`. With `transformers<4.51.0`, you will encounter the following error: ``` KeyError: 'qwen3' ``` The following contains a code snippet illustrating how to use the model generate content based on given inputs. ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "nobrand/KULLM-R" # load the tokenizer and the model tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) # prepare the model input system_prompt = "You are a helpful assistant.\nPlease reason step by step, and put your final answer within \\boxed{{}}" # Recommend to use given system prompt user_promt = "1부터 1008까지의 자연수 중 1008과 서로소인 자연수의 갯수를 구하시오." messages = [ {"role": "system", "content": system_prompt}, {"role": "user", "content": user_promt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) # conduct text completion generated_ids = model.generate( **model_inputs, max_new_tokens=16384 ) output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() # parsing thinking content try: # rindex finding 151668 () index = len(output_ids) - output_ids[::-1].index(151668) except ValueError: index = 0 thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n") content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n") print("thinking content:", thinking_content) print("content:", content) ``` > [!NOTE] > As mentioned in Qwen3, use `Temperature=0.6`, `TopP=0.95`, `TopK=20`, and `MinP=0` (the default setting in `generation_config.json`). **DO NOT use greedy decoding**, as it can lead to performance degradation and endless repetitions. ## Evaluation - Shows superior reasoning efficiency, shorter reasoning steps, higher readability in Korean, and better explanation quality compared to models of similar scale when evaluated on HRM-8K. | Task | Score | Think Step Length | Korean Response Ratio | |------------|:-----:|:------------------:|:---------------------:| | GSM8k | 91.9 | 896 | 94.47 | | KSM | 70.9 | 7979 | 80.6 | | MATH | 95.1 | 2668 | 96.12 | | OMNI Math | 61.9 | 7987 | 73.91 | ## Intended Use - Solving complex Korean mathematical and logical reasoning problems - Improved explainability for Korean logical reasoning - Tutoring and educational support in reasoning fields ## Citation ``` @misc{KULLM-R2025, title = {KULLM-R: Korea University Large Language Model for Reasoning}, author = {Korea University NLP&AI Lab}, year = {2025}, } ```