|
--- |
|
language: |
|
- en |
|
- fr |
|
- de |
|
- es |
|
- pt |
|
- it |
|
- ja |
|
- ko |
|
- ru |
|
- zh |
|
- ar |
|
- fa |
|
- id |
|
- ms |
|
- ne |
|
- pl |
|
- ro |
|
- sr |
|
- sv |
|
- tr |
|
- uk |
|
- vi |
|
- hi |
|
- bn |
|
license: apache-2.0 |
|
library_name: vllm |
|
inference: false |
|
--- |
|
|
|
# Model Card for Mistral-Small-3.1-24B-Base-2503 (TEXT ONLY) |
|
|
|
This is the text-only variant of [mistralai/Mistral-Small-3.1-24B-Base-2503](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Base-2503). |
|
This also serves as the base-model for [mistralai/Devstral-Small-2505](https://huggingface.co/mistralai/Devstral-Small-2505), which had no official base model released. |
|
|
|
Features: |
|
- Text-only, no multimodality. |
|
- 128k context length. |
|
|
|
How was a text-only model achieved? The vision encoder was removed and the model architecture was converted from mistral3 to mistral. The tokenizer was not modified. |
|
|
|
## Reproduced eval |
|
|
|
Serve with vLLM: |
|
|
|
``` |
|
vllm serve casperhansen/Mistral-Small-3.1-24B-Base-2503-Text-Only |
|
``` |
|
|
|
The reproduced results can be seen below. |
|
|
|
| Model | MMLU (0-shot) | |
|
|------------------------------------|-----------------| |
|
| Small 3.1 24B Base (Text Only) | 77.25% ± 0.0033 | |
|
| Small 3.1 24B Base (Multimodal) | 77.34% ± 0.0033 | |
|
|
|
### Original Multimodal: Full MMLU (Reproduced) |
|
|
|
``` |
|
lm_eval --model local-completions \ |
|
--model_args "base_url=http://localhost:8000/v1/completions,model=mistralai/Mistral-Small-3.1-24B-Base-2503" \ |
|
--tasks mmlu \ |
|
--batch_size 128 |
|
``` |
|
|
|
| Tasks |Version|Filter|n-shot|Metric| |Value | |Stderr| |
|
|---------------------------------------|------:|------|-----:|------|---|-----:|---|-----:| |
|
|mmlu | 2|none | |acc |↑ |0.7734|± |0.0033| |
|
| - humanities | 2|none | |acc |↑ |0.6820|± |0.0062| |
|
| - formal_logic | 1|none | 0|acc |↑ |0.5714|± |0.0443| |
|
| - high_school_european_history | 1|none | 0|acc |↑ |0.8303|± |0.0293| |
|
| - high_school_us_history | 1|none | 0|acc |↑ |0.9363|± |0.0171| |
|
| - high_school_world_history | 1|none | 0|acc |↑ |0.9241|± |0.0172| |
|
| - international_law | 1|none | 0|acc |↑ |0.9091|± |0.0262| |
|
| - jurisprudence | 1|none | 0|acc |↑ |0.8148|± |0.0376| |
|
| - logical_fallacies | 1|none | 0|acc |↑ |0.8589|± |0.0274| |
|
| - moral_disputes | 1|none | 0|acc |↑ |0.8208|± |0.0206| |
|
| - moral_scenarios | 1|none | 0|acc |↑ |0.3844|± |0.0163| |
|
| - philosophy | 1|none | 0|acc |↑ |0.8296|± |0.0214| |
|
| - prehistory | 1|none | 0|acc |↑ |0.8704|± |0.0187| |
|
| - professional_law | 1|none | 0|acc |↑ |0.6095|± |0.0125| |
|
| - world_religions | 1|none | 0|acc |↑ |0.8713|± |0.0257| |
|
| - other | 2|none | |acc |↑ |0.8317|± |0.0064| |
|
| - business_ethics | 1|none | 0|acc |↑ |0.8200|± |0.0386| |
|
| - clinical_knowledge | 1|none | 0|acc |↑ |0.8679|± |0.0208| |
|
| - college_medicine | 1|none | 0|acc |↑ |0.7803|± |0.0316| |
|
| - global_facts | 1|none | 0|acc |↑ |0.6600|± |0.0476| |
|
| - human_aging | 1|none | 0|acc |↑ |0.7982|± |0.0269| |
|
| - management | 1|none | 0|acc |↑ |0.9029|± |0.0293| |
|
| - marketing | 1|none | 0|acc |↑ |0.9359|± |0.0160| |
|
| - medical_genetics | 1|none | 0|acc |↑ |0.8900|± |0.0314| |
|
| - miscellaneous | 1|none | 0|acc |↑ |0.9183|± |0.0098| |
|
| - nutrition | 1|none | 0|acc |↑ |0.8791|± |0.0187| |
|
| - professional_accounting | 1|none | 0|acc |↑ |0.6277|± |0.0288| |
|
| - professional_medicine | 1|none | 0|acc |↑ |0.8603|± |0.0211| |
|
| - virology | 1|none | 0|acc |↑ |0.5602|± |0.0386| |
|
| - social sciences | 2|none | |acc |↑ |0.8736|± |0.0059| |
|
| - econometrics | 1|none | 0|acc |↑ |0.6491|± |0.0449| |
|
| - high_school_geography | 1|none | 0|acc |↑ |0.8990|± |0.0215| |
|
| - high_school_government_and_politics| 1|none | 0|acc |↑ |0.9637|± |0.0135| |
|
| - high_school_macroeconomics | 1|none | 0|acc |↑ |0.8103|± |0.0199| |
|
| - high_school_microeconomics | 1|none | 0|acc |↑ |0.9034|± |0.0192| |
|
| - high_school_psychology | 1|none | 0|acc |↑ |0.9358|± |0.0105| |
|
| - human_sexuality | 1|none | 0|acc |↑ |0.8855|± |0.0279| |
|
| - professional_psychology | 1|none | 0|acc |↑ |0.8578|± |0.0141| |
|
| - public_relations | 1|none | 0|acc |↑ |0.7909|± |0.0390| |
|
| - security_studies | 1|none | 0|acc |↑ |0.8327|± |0.0239| |
|
| - sociology | 1|none | 0|acc |↑ |0.9154|± |0.0197| |
|
| - us_foreign_policy | 1|none | 0|acc |↑ |0.9300|± |0.0256| |
|
| - stem | 2|none | |acc |↑ |0.7545|± |0.0073| |
|
| - abstract_algebra | 1|none | 0|acc |↑ |0.4600|± |0.0501| |
|
| - anatomy | 1|none | 0|acc |↑ |0.8148|± |0.0336| |
|
| - astronomy | 1|none | 0|acc |↑ |0.9211|± |0.0219| |
|
| - college_biology | 1|none | 0|acc |↑ |0.9444|± |0.0192| |
|
| - college_chemistry | 1|none | 0|acc |↑ |0.5700|± |0.0498| |
|
| - college_computer_science | 1|none | 0|acc |↑ |0.7100|± |0.0456| |
|
| - college_mathematics | 1|none | 0|acc |↑ |0.6200|± |0.0488| |
|
| - college_physics | 1|none | 0|acc |↑ |0.6569|± |0.0472| |
|
| - computer_security | 1|none | 0|acc |↑ |0.8300|± |0.0378| |
|
| - conceptual_physics | 1|none | 0|acc |↑ |0.8170|± |0.0253| |
|
| - electrical_engineering | 1|none | 0|acc |↑ |0.7931|± |0.0338| |
|
| - elementary_mathematics | 1|none | 0|acc |↑ |0.7910|± |0.0209| |
|
| - high_school_biology | 1|none | 0|acc |↑ |0.9323|± |0.0143| |
|
| - high_school_chemistry | 1|none | 0|acc |↑ |0.7586|± |0.0301| |
|
| - high_school_computer_science | 1|none | 0|acc |↑ |0.8900|± |0.0314| |
|
| - high_school_mathematics | 1|none | 0|acc |↑ |0.5185|± |0.0305| |
|
| - high_school_physics | 1|none | 0|acc |↑ |0.6291|± |0.0394| |
|
| - high_school_statistics | 1|none | 0|acc |↑ |0.7593|± |0.0292| |
|
| - machine_learning | 1|none | 0|acc |↑ |0.6250|± |0.0460| |
|
|
|
| Groups |Version|Filter|n-shot|Metric| |Value | |Stderr| |
|
|------------------|------:|------|------|------|---|-----:|---|-----:| |
|
|mmlu | 2|none | |acc |↑ |0.7734|± |0.0033| |
|
| - humanities | 2|none | |acc |↑ |0.6820|± |0.0062| |
|
| - other | 2|none | |acc |↑ |0.8317|± |0.0064| |
|
| - social sciences| 2|none | |acc |↑ |0.8736|± |0.0059| |
|
| - stem | 2|none | |acc |↑ |0.7545|± |0.0073| |
|
|
|
### Text Only: Full MMLU |
|
|
|
``` |
|
lm_eval --model local-completions \ |
|
--model_args "base_url=http://localhost:8000/v1/completions,model=casperhansen/Mistral-Small-3.1-24B-Base-2503-Text-Only" \ |
|
--tasks mmlu \ |
|
--batch_size 128 |
|
``` |
|
|
|
| Tasks |Version|Filter|n-shot|Metric| |Value | |Stderr| |
|
|---------------------------------------|------:|------|-----:|------|---|-----:|---|-----:| |
|
|mmlu | 2|none | |acc |↑ |0.7725|± |0.0033| |
|
| - humanities | 2|none | |acc |↑ |0.6793|± |0.0062| |
|
| - formal_logic | 1|none | 0|acc |↑ |0.5397|± |0.0446| |
|
| - high_school_european_history | 1|none | 0|acc |↑ |0.8364|± |0.0289| |
|
| - high_school_us_history | 1|none | 0|acc |↑ |0.9363|± |0.0171| |
|
| - high_school_world_history | 1|none | 0|acc |↑ |0.9198|± |0.0177| |
|
| - international_law | 1|none | 0|acc |↑ |0.9008|± |0.0273| |
|
| - jurisprudence | 1|none | 0|acc |↑ |0.8148|± |0.0376| |
|
| - logical_fallacies | 1|none | 0|acc |↑ |0.8405|± |0.0288| |
|
| - moral_disputes | 1|none | 0|acc |↑ |0.8237|± |0.0205| |
|
| - moral_scenarios | 1|none | 0|acc |↑ |0.3765|± |0.0162| |
|
| - philosophy | 1|none | 0|acc |↑ |0.8264|± |0.0215| |
|
| - prehistory | 1|none | 0|acc |↑ |0.8704|± |0.0187| |
|
| - professional_law | 1|none | 0|acc |↑ |0.6108|± |0.0125| |
|
| - world_religions | 1|none | 0|acc |↑ |0.8713|± |0.0257| |
|
| - other | 2|none | |acc |↑ |0.8339|± |0.0064| |
|
| - business_ethics | 1|none | 0|acc |↑ |0.8300|± |0.0378| |
|
| - clinical_knowledge | 1|none | 0|acc |↑ |0.8679|± |0.0208| |
|
| - college_medicine | 1|none | 0|acc |↑ |0.7746|± |0.0319| |
|
| - global_facts | 1|none | 0|acc |↑ |0.6800|± |0.0469| |
|
| - human_aging | 1|none | 0|acc |↑ |0.8027|± |0.0267| |
|
| - management | 1|none | 0|acc |↑ |0.9029|± |0.0293| |
|
| - marketing | 1|none | 0|acc |↑ |0.9402|± |0.0155| |
|
| - medical_genetics | 1|none | 0|acc |↑ |0.8900|± |0.0314| |
|
| - miscellaneous | 1|none | 0|acc |↑ |0.9208|± |0.0097| |
|
| - nutrition | 1|none | 0|acc |↑ |0.8791|± |0.0187| |
|
| - professional_accounting | 1|none | 0|acc |↑ |0.6312|± |0.0288| |
|
| - professional_medicine | 1|none | 0|acc |↑ |0.8603|± |0.0211| |
|
| - virology | 1|none | 0|acc |↑ |0.5602|± |0.0386| |
|
| - social sciences | 2|none | |acc |↑ |0.8739|± |0.0059| |
|
| - econometrics | 1|none | 0|acc |↑ |0.6667|± |0.0443| |
|
| - high_school_geography | 1|none | 0|acc |↑ |0.8939|± |0.0219| |
|
| - high_school_government_and_politics| 1|none | 0|acc |↑ |0.9585|± |0.0144| |
|
| - high_school_macroeconomics | 1|none | 0|acc |↑ |0.8103|± |0.0199| |
|
| - high_school_microeconomics | 1|none | 0|acc |↑ |0.9076|± |0.0188| |
|
| - high_school_psychology | 1|none | 0|acc |↑ |0.9358|± |0.0105| |
|
| - human_sexuality | 1|none | 0|acc |↑ |0.8855|± |0.0279| |
|
| - professional_psychology | 1|none | 0|acc |↑ |0.8578|± |0.0141| |
|
| - public_relations | 1|none | 0|acc |↑ |0.7909|± |0.0390| |
|
| - security_studies | 1|none | 0|acc |↑ |0.8327|± |0.0239| |
|
| - sociology | 1|none | 0|acc |↑ |0.9104|± |0.0202| |
|
| - us_foreign_policy | 1|none | 0|acc |↑ |0.9400|± |0.0239| |
|
| - stem | 2|none | |acc |↑ |0.7520|± |0.0073| |
|
| - abstract_algebra | 1|none | 0|acc |↑ |0.4500|± |0.0500| |
|
| - anatomy | 1|none | 0|acc |↑ |0.8296|± |0.0325| |
|
| - astronomy | 1|none | 0|acc |↑ |0.9211|± |0.0219| |
|
| - college_biology | 1|none | 0|acc |↑ |0.9444|± |0.0192| |
|
| - college_chemistry | 1|none | 0|acc |↑ |0.5600|± |0.0499| |
|
| - college_computer_science | 1|none | 0|acc |↑ |0.7100|± |0.0456| |
|
| - college_mathematics | 1|none | 0|acc |↑ |0.6200|± |0.0488| |
|
| - college_physics | 1|none | 0|acc |↑ |0.6569|± |0.0472| |
|
| - computer_security | 1|none | 0|acc |↑ |0.8300|± |0.0378| |
|
| - conceptual_physics | 1|none | 0|acc |↑ |0.8213|± |0.0250| |
|
| - electrical_engineering | 1|none | 0|acc |↑ |0.7862|± |0.0342| |
|
| - elementary_mathematics | 1|none | 0|acc |↑ |0.7804|± |0.0213| |
|
| - high_school_biology | 1|none | 0|acc |↑ |0.9290|± |0.0146| |
|
| - high_school_chemistry | 1|none | 0|acc |↑ |0.7488|± |0.0305| |
|
| - high_school_computer_science | 1|none | 0|acc |↑ |0.8900|± |0.0314| |
|
| - high_school_mathematics | 1|none | 0|acc |↑ |0.5222|± |0.0305| |
|
| - high_school_physics | 1|none | 0|acc |↑ |0.6225|± |0.0396| |
|
| - high_school_statistics | 1|none | 0|acc |↑ |0.7500|± |0.0295| |
|
| - machine_learning | 1|none | 0|acc |↑ |0.6339|± |0.0457| |
|
|
|
| Groups |Version|Filter|n-shot|Metric| |Value | |Stderr| |
|
|------------------|------:|------|------|------|---|-----:|---|-----:| |
|
|mmlu | 2|none | |acc |↑ |0.7725|± |0.0033| |
|
| - humanities | 2|none | |acc |↑ |0.6793|± |0.0062| |
|
| - other | 2|none | |acc |↑ |0.8339|± |0.0064| |
|
| - social sciences| 2|none | |acc |↑ |0.8739|± |0.0059| |
|
| - stem | 2|none | |acc |↑ |0.7520|± |0.0073| |
|
|