Qwen 2.5 3B Reasoner
This model is a fine-tuned version of the Qwen 2.5 3B model optimized for reasoning tasks.
Training Details
- Base model: Qwen 2.5 3B
- Training method: GRPO (Generalized Rejection on Policy Optimization)
- Training focus: Enhanced reasoning capabilities
- Training parameters: [Add any important hyperparameters]
Intended Use
This model is designed for tasks requiring strong reasoning abilities, including logical problems, multi-step reasoning chains, and complex decision-making scenarios.
Limitations
[Note any known limitations]
- Downloads last month
- 19
Hardware compatibility
Log In
to view the estimation
4-bit
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
HF Inference deployability: The model has no library tag.