casperhansen's picture
Add files using upload-large-folder tool
b3e3307 verified
---
language:
- en
- fr
- de
- es
- pt
- it
- ja
- ko
- ru
- zh
- ar
- fa
- id
- ms
- ne
- pl
- ro
- sr
- sv
- tr
- uk
- vi
- hi
- bn
license: apache-2.0
library_name: vllm
inference: false
---
# Model Card for Mistral-Small-3.1-24B-Base-2503 (TEXT ONLY)
This is the text-only variant of [mistralai/Mistral-Small-3.1-24B-Base-2503](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Base-2503).
This also serves as the base-model for [mistralai/Devstral-Small-2505](https://huggingface.co/mistralai/Devstral-Small-2505), which had no official base model released.
Features:
- Text-only, no multimodality.
- 128k context length.
How was a text-only model achieved? The vision encoder was removed and the model architecture was converted from mistral3 to mistral. The tokenizer was not modified.
## Reproduced eval
Serve with vLLM:
```
vllm serve casperhansen/Mistral-Small-3.1-24B-Base-2503-Text-Only
```
The reproduced results can be seen below.
| Model | MMLU (0-shot) |
|------------------------------------|-----------------|
| Small 3.1 24B Base (Text Only) | 77.25% ± 0.0033 |
| Small 3.1 24B Base (Multimodal) | 77.34% ± 0.0033 |
### Original Multimodal: Full MMLU (Reproduced)
```
lm_eval --model local-completions \
--model_args "base_url=http://localhost:8000/v1/completions,model=mistralai/Mistral-Small-3.1-24B-Base-2503" \
--tasks mmlu \
--batch_size 128
```
| Tasks |Version|Filter|n-shot|Metric| |Value | |Stderr|
|---------------------------------------|------:|------|-----:|------|---|-----:|---|-----:|
|mmlu | 2|none | |acc |↑ |0.7734|± |0.0033|
| - humanities | 2|none | |acc |↑ |0.6820|± |0.0062|
| - formal_logic | 1|none | 0|acc |↑ |0.5714|± |0.0443|
| - high_school_european_history | 1|none | 0|acc |↑ |0.8303|± |0.0293|
| - high_school_us_history | 1|none | 0|acc |↑ |0.9363|± |0.0171|
| - high_school_world_history | 1|none | 0|acc |↑ |0.9241|± |0.0172|
| - international_law | 1|none | 0|acc |↑ |0.9091|± |0.0262|
| - jurisprudence | 1|none | 0|acc |↑ |0.8148|± |0.0376|
| - logical_fallacies | 1|none | 0|acc |↑ |0.8589|± |0.0274|
| - moral_disputes | 1|none | 0|acc |↑ |0.8208|± |0.0206|
| - moral_scenarios | 1|none | 0|acc |↑ |0.3844|± |0.0163|
| - philosophy | 1|none | 0|acc |↑ |0.8296|± |0.0214|
| - prehistory | 1|none | 0|acc |↑ |0.8704|± |0.0187|
| - professional_law | 1|none | 0|acc |↑ |0.6095|± |0.0125|
| - world_religions | 1|none | 0|acc |↑ |0.8713|± |0.0257|
| - other | 2|none | |acc |↑ |0.8317|± |0.0064|
| - business_ethics | 1|none | 0|acc |↑ |0.8200|± |0.0386|
| - clinical_knowledge | 1|none | 0|acc |↑ |0.8679|± |0.0208|
| - college_medicine | 1|none | 0|acc |↑ |0.7803|± |0.0316|
| - global_facts | 1|none | 0|acc |↑ |0.6600|± |0.0476|
| - human_aging | 1|none | 0|acc |↑ |0.7982|± |0.0269|
| - management | 1|none | 0|acc |↑ |0.9029|± |0.0293|
| - marketing | 1|none | 0|acc |↑ |0.9359|± |0.0160|
| - medical_genetics | 1|none | 0|acc |↑ |0.8900|± |0.0314|
| - miscellaneous | 1|none | 0|acc |↑ |0.9183|± |0.0098|
| - nutrition | 1|none | 0|acc |↑ |0.8791|± |0.0187|
| - professional_accounting | 1|none | 0|acc |↑ |0.6277|± |0.0288|
| - professional_medicine | 1|none | 0|acc |↑ |0.8603|± |0.0211|
| - virology | 1|none | 0|acc |↑ |0.5602|± |0.0386|
| - social sciences | 2|none | |acc |↑ |0.8736|± |0.0059|
| - econometrics | 1|none | 0|acc |↑ |0.6491|± |0.0449|
| - high_school_geography | 1|none | 0|acc |↑ |0.8990|± |0.0215|
| - high_school_government_and_politics| 1|none | 0|acc |↑ |0.9637|± |0.0135|
| - high_school_macroeconomics | 1|none | 0|acc |↑ |0.8103|± |0.0199|
| - high_school_microeconomics | 1|none | 0|acc |↑ |0.9034|± |0.0192|
| - high_school_psychology | 1|none | 0|acc |↑ |0.9358|± |0.0105|
| - human_sexuality | 1|none | 0|acc |↑ |0.8855|± |0.0279|
| - professional_psychology | 1|none | 0|acc |↑ |0.8578|± |0.0141|
| - public_relations | 1|none | 0|acc |↑ |0.7909|± |0.0390|
| - security_studies | 1|none | 0|acc |↑ |0.8327|± |0.0239|
| - sociology | 1|none | 0|acc |↑ |0.9154|± |0.0197|
| - us_foreign_policy | 1|none | 0|acc |↑ |0.9300|± |0.0256|
| - stem | 2|none | |acc |↑ |0.7545|± |0.0073|
| - abstract_algebra | 1|none | 0|acc |↑ |0.4600|± |0.0501|
| - anatomy | 1|none | 0|acc |↑ |0.8148|± |0.0336|
| - astronomy | 1|none | 0|acc |↑ |0.9211|± |0.0219|
| - college_biology | 1|none | 0|acc |↑ |0.9444|± |0.0192|
| - college_chemistry | 1|none | 0|acc |↑ |0.5700|± |0.0498|
| - college_computer_science | 1|none | 0|acc |↑ |0.7100|± |0.0456|
| - college_mathematics | 1|none | 0|acc |↑ |0.6200|± |0.0488|
| - college_physics | 1|none | 0|acc |↑ |0.6569|± |0.0472|
| - computer_security | 1|none | 0|acc |↑ |0.8300|± |0.0378|
| - conceptual_physics | 1|none | 0|acc |↑ |0.8170|± |0.0253|
| - electrical_engineering | 1|none | 0|acc |↑ |0.7931|± |0.0338|
| - elementary_mathematics | 1|none | 0|acc |↑ |0.7910|± |0.0209|
| - high_school_biology | 1|none | 0|acc |↑ |0.9323|± |0.0143|
| - high_school_chemistry | 1|none | 0|acc |↑ |0.7586|± |0.0301|
| - high_school_computer_science | 1|none | 0|acc |↑ |0.8900|± |0.0314|
| - high_school_mathematics | 1|none | 0|acc |↑ |0.5185|± |0.0305|
| - high_school_physics | 1|none | 0|acc |↑ |0.6291|± |0.0394|
| - high_school_statistics | 1|none | 0|acc |↑ |0.7593|± |0.0292|
| - machine_learning | 1|none | 0|acc |↑ |0.6250|± |0.0460|
| Groups |Version|Filter|n-shot|Metric| |Value | |Stderr|
|------------------|------:|------|------|------|---|-----:|---|-----:|
|mmlu | 2|none | |acc |↑ |0.7734|± |0.0033|
| - humanities | 2|none | |acc |↑ |0.6820|± |0.0062|
| - other | 2|none | |acc |↑ |0.8317|± |0.0064|
| - social sciences| 2|none | |acc |↑ |0.8736|± |0.0059|
| - stem | 2|none | |acc |↑ |0.7545|± |0.0073|
### Text Only: Full MMLU
```
lm_eval --model local-completions \
--model_args "base_url=http://localhost:8000/v1/completions,model=casperhansen/Mistral-Small-3.1-24B-Base-2503-Text-Only" \
--tasks mmlu \
--batch_size 128
```
| Tasks |Version|Filter|n-shot|Metric| |Value | |Stderr|
|---------------------------------------|------:|------|-----:|------|---|-----:|---|-----:|
|mmlu | 2|none | |acc |↑ |0.7725|± |0.0033|
| - humanities | 2|none | |acc |↑ |0.6793|± |0.0062|
| - formal_logic | 1|none | 0|acc |↑ |0.5397|± |0.0446|
| - high_school_european_history | 1|none | 0|acc |↑ |0.8364|± |0.0289|
| - high_school_us_history | 1|none | 0|acc |↑ |0.9363|± |0.0171|
| - high_school_world_history | 1|none | 0|acc |↑ |0.9198|± |0.0177|
| - international_law | 1|none | 0|acc |↑ |0.9008|± |0.0273|
| - jurisprudence | 1|none | 0|acc |↑ |0.8148|± |0.0376|
| - logical_fallacies | 1|none | 0|acc |↑ |0.8405|± |0.0288|
| - moral_disputes | 1|none | 0|acc |↑ |0.8237|± |0.0205|
| - moral_scenarios | 1|none | 0|acc |↑ |0.3765|± |0.0162|
| - philosophy | 1|none | 0|acc |↑ |0.8264|± |0.0215|
| - prehistory | 1|none | 0|acc |↑ |0.8704|± |0.0187|
| - professional_law | 1|none | 0|acc |↑ |0.6108|± |0.0125|
| - world_religions | 1|none | 0|acc |↑ |0.8713|± |0.0257|
| - other | 2|none | |acc |↑ |0.8339|± |0.0064|
| - business_ethics | 1|none | 0|acc |↑ |0.8300|± |0.0378|
| - clinical_knowledge | 1|none | 0|acc |↑ |0.8679|± |0.0208|
| - college_medicine | 1|none | 0|acc |↑ |0.7746|± |0.0319|
| - global_facts | 1|none | 0|acc |↑ |0.6800|± |0.0469|
| - human_aging | 1|none | 0|acc |↑ |0.8027|± |0.0267|
| - management | 1|none | 0|acc |↑ |0.9029|± |0.0293|
| - marketing | 1|none | 0|acc |↑ |0.9402|± |0.0155|
| - medical_genetics | 1|none | 0|acc |↑ |0.8900|± |0.0314|
| - miscellaneous | 1|none | 0|acc |↑ |0.9208|± |0.0097|
| - nutrition | 1|none | 0|acc |↑ |0.8791|± |0.0187|
| - professional_accounting | 1|none | 0|acc |↑ |0.6312|± |0.0288|
| - professional_medicine | 1|none | 0|acc |↑ |0.8603|± |0.0211|
| - virology | 1|none | 0|acc |↑ |0.5602|± |0.0386|
| - social sciences | 2|none | |acc |↑ |0.8739|± |0.0059|
| - econometrics | 1|none | 0|acc |↑ |0.6667|± |0.0443|
| - high_school_geography | 1|none | 0|acc |↑ |0.8939|± |0.0219|
| - high_school_government_and_politics| 1|none | 0|acc |↑ |0.9585|± |0.0144|
| - high_school_macroeconomics | 1|none | 0|acc |↑ |0.8103|± |0.0199|
| - high_school_microeconomics | 1|none | 0|acc |↑ |0.9076|± |0.0188|
| - high_school_psychology | 1|none | 0|acc |↑ |0.9358|± |0.0105|
| - human_sexuality | 1|none | 0|acc |↑ |0.8855|± |0.0279|
| - professional_psychology | 1|none | 0|acc |↑ |0.8578|± |0.0141|
| - public_relations | 1|none | 0|acc |↑ |0.7909|± |0.0390|
| - security_studies | 1|none | 0|acc |↑ |0.8327|± |0.0239|
| - sociology | 1|none | 0|acc |↑ |0.9104|± |0.0202|
| - us_foreign_policy | 1|none | 0|acc |↑ |0.9400|± |0.0239|
| - stem | 2|none | |acc |↑ |0.7520|± |0.0073|
| - abstract_algebra | 1|none | 0|acc |↑ |0.4500|± |0.0500|
| - anatomy | 1|none | 0|acc |↑ |0.8296|± |0.0325|
| - astronomy | 1|none | 0|acc |↑ |0.9211|± |0.0219|
| - college_biology | 1|none | 0|acc |↑ |0.9444|± |0.0192|
| - college_chemistry | 1|none | 0|acc |↑ |0.5600|± |0.0499|
| - college_computer_science | 1|none | 0|acc |↑ |0.7100|± |0.0456|
| - college_mathematics | 1|none | 0|acc |↑ |0.6200|± |0.0488|
| - college_physics | 1|none | 0|acc |↑ |0.6569|± |0.0472|
| - computer_security | 1|none | 0|acc |↑ |0.8300|± |0.0378|
| - conceptual_physics | 1|none | 0|acc |↑ |0.8213|± |0.0250|
| - electrical_engineering | 1|none | 0|acc |↑ |0.7862|± |0.0342|
| - elementary_mathematics | 1|none | 0|acc |↑ |0.7804|± |0.0213|
| - high_school_biology | 1|none | 0|acc |↑ |0.9290|± |0.0146|
| - high_school_chemistry | 1|none | 0|acc |↑ |0.7488|± |0.0305|
| - high_school_computer_science | 1|none | 0|acc |↑ |0.8900|± |0.0314|
| - high_school_mathematics | 1|none | 0|acc |↑ |0.5222|± |0.0305|
| - high_school_physics | 1|none | 0|acc |↑ |0.6225|± |0.0396|
| - high_school_statistics | 1|none | 0|acc |↑ |0.7500|± |0.0295|
| - machine_learning | 1|none | 0|acc |↑ |0.6339|± |0.0457|
| Groups |Version|Filter|n-shot|Metric| |Value | |Stderr|
|------------------|------:|------|------|------|---|-----:|---|-----:|
|mmlu | 2|none | |acc |↑ |0.7725|± |0.0033|
| - humanities | 2|none | |acc |↑ |0.6793|± |0.0062|
| - other | 2|none | |acc |↑ |0.8339|± |0.0064|
| - social sciences| 2|none | |acc |↑ |0.8739|± |0.0059|
| - stem | 2|none | |acc |↑ |0.7520|± |0.0073|