32B_LLM_AdaptiveMath_data

[🤗 HF Dataset]

LLM-Adaptive-CoT-Code-data

[🤗 HF Dataset]

LLM-Adaptive-ZMath-model-32B

[🤗 LLM-Adaptive-ZMath-model-32B]

LLM-Adaptive-ZCode-model-32B

[🤗 LLM-Adaptive-ZCode-model-32B]

Model Overview

This work presents a fine-tuned reasoning model built on the DeepSeek-Distill architecture through a novel LLM-Adaptive Question Difficulty Grading method. Unlike traditional CoT generation approaches, this model leverages the reasoning strength of DeepSeek-R1 (671B) to distill high-quality chain-of-thought (CoT) data. A core innovation lies in the dynamic construction of difficulty-aligned datasets based on the target LLM's own problem-solving capabilities.

The proposed approach includes adaptive evaluation of question difficulty, followed by tailored sampling and response generation. This enables the model to efficiently learn from progressively challenging problems, thereby boosting reasoning performance across multiple domains such as mathematical problem solving and code generation.

Fine-tuned variants like ZMath-32B and ZCode-32B exhibit superior performance to baseline models like DeepSeek-Distill-32B and phi-4, even with limited high-quality data. Notably, the ZMath-32B model trained on only 2K PRM-graded CoT samples surpassed its baseline across all math benchmarks, confirming the effectiveness of the adaptive CoT generation methodology.

Training Configuration

Our training framework builds on previous advancements in s1-1k, LIMO, and Light-R1, implemented through the LLama-Factory to leverage its proven scalability. The framework incorporates the Deepseek-R1 template, flash-attention2 and Liger-Kernel to improve computational efficiency while minimizing memory requirements. All experiments are conducted on a 2×8 H800 GPU cluster, with performance evaluations executed using the Skythought benchmarking suite.

The training configuration for grpo is as follows:

Context Length: 16,384 tokens
Learning Rate: 5e-6
Batch Size: 128
Epochs: 10

Usage

You can load the model using the Hugging Face transformers library:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Replace with the actual path to your model on Hugging Face.
model_name = "your-org/ZMath-32B"

# Load the tokenizer.
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

# Load the model (with multi‑GPU support and automatic allocation to available devices).
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,         # Use float16 precision to save GPU memory
    device_map="auto",                 # Automatically distribute the model across multiple GPUs.
    trust_remote_code=True
)

# 示例推理
prompt = "Solve the following math problem step by step: 12 * (3 + 4) = ?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=100)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

ZTE-AIM
/

LLM-Adaptive-ZMath-model-32B