Fast-Math-R1-14B
Collection
Fast-Math-R1-14B, a model that can solve math problems faster while maintaining the same accuracy as DeepSeek-R1-Qwen-14B.
•
4 items
•
Updated
By applying SFT and GRPO on difficult math problems, we enhanced the performance of DeepSeek-R1-Distill-Qwen-14B
and developed Fast-Math-R1-14B
,
which achieves up to 60% (on average approx. 30%) faster inference while maintaining accuracy.
Technical details can be found in Kaggle Discussion and Github.
AIME 2024 | AIME 2025 | ||||
---|---|---|---|---|---|
Model | Token budget | Pass@1 (avg. 64) | Output tokens | Pass@1 (avg. 64) | Output tokens |
DeepSeek-R1-Distill-Qwen-14B | 16384 | 63.3 | 9590 | 46.7 | 10602 |
12800 | 58 | 8632 | 41.9 | 9363 | |
8192 | 45.6 | 6638 | 30.6 | 6897 | |
Light-R1-14B-DS | 16384 | 66.8 | 10146 | 51.3 | 11308 |
12800 | 59.2 | 9110 | 43.8 | 9834 | |
8192 | 42.4 | 7020 | 30.4 | 7124 | |
Fast-Math-R1-14B | 16384 | 66 | 7932 | 49.2 | 9066 |
12800 | 63 | 7449 | 46.1 | 8282 | |
8192 | 51.4 | 5963 | 37.2 | 6256 | |
Fast-Math-R1-14B-SFT Only | 16384 | 65.2 | 10268 | 49.7 | 11264 |
12800 | 57.2 | 9180 | 42.8 | 9805 | |
8192 | 41.3 | 7015 | 30.1 | 7074 |
from vllm import LLM, SamplingParams
from transformers import AutoTokenizer
model_path = 'RabotniKuma/Fast-Math-R1-14B'
vllm_engine = LLM(
model=model_path,
max_model_len=8192,
gpu_memory_utilization=0.9,
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(model_path)
sampling_params = SamplingParams(
temperature=1.0,
top_p=0.90,
min_p=0.05,
max_tokens=8192,
stop='</think>', # Important!: early stop at </think> to save output tokens
)
messages = [
{
'role': 'user',
'content': (
'Solve the problem, and put the answer in \boxed{{}}. '
'Sarah is twice as old as her youngest brother. If the difference between their ages is 15 years. How old is her youngest brother?'
)
}
]
messages = tokenizer.apply_chat_template(
conversation=messages,
tokenize=False,
add_generation_prompt=True
)
response = vllm_engine.generate(messages, sampling_params=sampling_params)