LIMO-Qwen3-8B-Math-Full-Precision-v2

Full-precision (bfloat16) merged model trained with LIMO methodology for mathematical reasoning.

Model Details

  • Base Model: Qwen/Qwen3-8B (full precision)
  • Training Method: LIMO (Less is More for Reasoning)
  • Dataset: GAIR/LIMO (817 high-quality samples)
  • Training Approach: LoRA fine-tuning โ†’ Full merge
  • Model Size: ~16GB (full precision)
  • Precision: bfloat16

Training Configuration

  • LoRA Rank: 8 (conservative)
  • Learning Rate: 5e-6 (conservative)
  • Epochs: 1 (prevents overfitting)
  • Batch Size: 2 (memory optimized)
  • Gradient Accumulation: 4

Expected Performance

  • Elementary Math: +2-4% improvement over base Qwen3-8B
  • High School Math: +2-4% improvement over base Qwen3-8B
  • Reasoning Quality: Enhanced step-by-step mathematical reasoning

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model
model = AutoModelForCausalLM.from_pretrained(
    "Cbgcbg/limo-qwen3-8b-math-full-precision_v3",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

tokenizer = AutoTokenizer.from_pretrained(
    "Cbgcbg/limo-qwen3-8b-math-full-precision_v3",
    trust_remote_code=True
)

# Example usage
messages = [
    {"role": "user", "content": "Solve: 2x + 3 = 11"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.7,
        do_sample=True
    )

response = tokenizer.decode(outputs[0][len(inputs.input_ids[0]):], skip_special_tokens=True)
print(response)

Model Comparison

Model Size Precision Performance
Original Gasing-8B 15.26 GB Full โœ… Baseline
Previous LIMO (quantized) 5.55 GB 4-bit โŒ Degraded
This LIMO (full precision) ~16 GB bfloat16 โœ… Expected +2-4%

Training Details

This model was trained using the LIMO methodology, which demonstrates that high-quality mathematical reasoning can be achieved with minimal but carefully curated training data (817 samples vs typical 100k+ datasets).

Key improvements:

  • โœ… Perfect format alignment with evaluation benchmarks
  • โœ… Conservative training parameters to prevent overfitting
  • โœ… Full precision training and inference
  • โœ… High-quality reasoning chains from GAIR/LIMO dataset

Citation

If you use this model, please cite the LIMO paper:

@misc{ye2025limo,
    title={LIMO: Less is More for Reasoning},
    author={Yixin Ye and Zhen Huang and Yang Xiao and Ethan Chern and Shijie Xia and Pengfei Liu},
    year={2025},
    eprint={2502.03387},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
18
Safetensors
Model size
8.19B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Cbgcbg/limo-qwen3-8b-math-full-precision_v3

Base model

Qwen/Qwen3-8B-Base
Finetuned
Qwen/Qwen3-8B
Finetuned
(211)
this model