Mistral-Small-Reasoning
This model is a fine-tuned version of mistralai/Mistral-Small-24B-Instruct-2501, specifically optimized for mathematical reasoning tasks. It has been fine-tuned on datasets including OpenR1-Math-220k, and s1K-1.1, aiming to enhance its reasoning capabilities.
Model Details
Model Description
- Developed by: Yenting Lin
- Funded by: Ubitus
- Model type: Instruction-tuned language model for reasoning
- Language(s) (NLP): English (en)
- License: Apache 2.0
- Finetuned from model: mistralai/Mistral-Small-24B-Instruct-2501
How to Get Started with the Model
A demo is available at twllm.com, and inference can be run using vLLM or sglang.
Training Details
The model was trained using 4×8 H100 GPUs, provided by Ubitus.
See Training config
axolotl version: a98526ef7843a3e8aa006f260e6b4fb8912b5f1a
base_model: mistralai/Mistral-Small-24B-Instruct-2501
plugins:
- axolotl.integrations.liger.LigerPlugin
liger_rope: true
liger_rms_norm: true
liger_swiglu: true
liger_fused_linear_cross_entropy: true
datasets:
- path: yentinglin/s1K-1.1-trl-format
type: chat_template
chat_template: tokenizer_default
field_messages: messages
message_field_role: role
message_field_content: content
- path: open-r1/OpenR1-Math-220k
type: chat_template
chat_template: tokenizer_default
field_messages: messages
message_field_role: from
message_field_content: value
dataset_prepared_path:
val_set_size: 0.0
output_dir: ./placeholder/
sequence_len: 32768
sample_packing: true
eval_sample_packing: False
pad_to_sequence_len: true
wandb_project: Reasoning
wandb_entity:
wandb_watch:
wandb_name: Mistral-24B-SFT-220k
wandb_log_model:
gradient_accumulation_steps: 4
micro_batch_size: 1
num_epochs: 5
optimizer: adamw_torch_fused
lr_scheduler: cosine
learning_rate: 2e-5
train_on_inputs: false
group_by_length: false
bf16: auto
tf32: false
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: false
logging_steps: 1
flash_attention: true
warmup_ratio: 0.1
saves_per_epoch: 2
weight_decay: 0.0
deepspeed: deepspeed_configs/zero3_bf16.json
special_tokens:
pad_token: "<pad>"
Evaluation
The evaluation code is available at Hugging Face Open-R1. Note that I have updated the AIME 25 dataset to the full set, available at AIME 2025.
Our results below are averaged over multiple runs. See our eval details here.
Pass@1 | # Params | MATH-500 | AIME 2025 | AIME 2024 | GPQA Diamond |
---|---|---|---|---|---|
Mistral-24B-Reasoning (Ours) | 24B | 95.0 | 53.33 | 66.67 | 62.02 |
Mistral-24B-Instruct | 24B | 70.6 | - | - | 45.3 |
s1.1-32B | 32B | 93.2 | 40.0 | 56.7 | 61.62 |
LIMO | 32B | 94.8 | 36.67 | 57.1 | 59.09 |
DeepSeek-R1-Distill-Llama-70B | 70B | 94.5 | 46.67 | 70.0 | 65.2 |
DeepSeek-R1-Distill-Qwen-32B | 32B | 94.3 | 60.0 | 72.6 | 62.1 |
DeepSeek-R1 | 671B | 97.3 | 70.0 | 72.6 | 71.5 |
o1 | - | 96.4 | 79.0 | - | 75.7 |
o3-mini (high) | - | 97.9 | 86.5 | - | 77.2 |
o3-mini (medium) | - | 97.3 | 76.5 | - | 74.9 |
Citation
If you use this model, please cite:
@article{yentinglin2025_mistral_reasoning,
author = {Yenting Lin},
title = {Mistral-Small-24B-Instruct-2501-reasoning},
journal = {Hugging Face},
year = {2025},
url = {https://huggingface.co/yentinglin/Mistral-Small-24B-Instruct-2501-reasoning}
}
- Downloads last month
- 4
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.
Model tree for KnutJaegersberg/Mistral-Small-24B-Instruct-2501-reasoning-exl2-8.0bpw
Base model
mistralai/Mistral-Small-24B-Base-2501Datasets used to train KnutJaegersberg/Mistral-Small-24B-Instruct-2501-reasoning-exl2-8.0bpw
Evaluation results
- pass@1 on MATH-500yentinglin/zhtw-reasoning-eval-leaderboard0.950
- pass@1 on AIME 2025yentinglin/zhtw-reasoning-eval-leaderboard0.533
- pass@1 on AIME 2024yentinglin/zhtw-reasoning-eval-leaderboard0.667
- pass@1 on GPQA Diamondyentinglin/zhtw-reasoning-eval-leaderboard0.620