library_name: transformers
license: llama3.1
language:
- et
- en
base_model:
- meta-llama/Llama-3.1-8B
pipeline_tag: text-generation
Model Card for llama-estllm-protype-0825
llama-estllm-protype-0825 is the first artifact produced by the EstLLM project. The intention of this release is to evaluate the first prototype in a conversational ChatbotArena-style setting on baromeeter.ai, and thus establish a baseline for future improvements.
The model underwent continuous pre-training starting from Llama-3.1-8B on approximately 35B tokens, then supervised fine-tuning and direct preference optimization were applied.
Model Details
Model Description
- Developed by: TartuNLP and TalTechNLP research groups
- Funded by: Estonian Ministry of Education and Research, “Estonian Language Technology Program 2018-2027”
- Model type: Causal Language Model, Instruction-following
- Language(s) (NLP): Estonian, English
- License: Llama 3.1 Community License Agreement
- Finetuned from model meta-llama/Llama-3.1-8B
Evaluation
Logits-based
Scores for logits-based evaluation benchmarks are available on the EuroEval leaderboard.
Generative
Every benchmark in this category is treated as a generative problem, and thus the evaluation is performed on the model responses obtained with 0 temperature (not logits). The top scores are higlighted with bold. Second best scores are highlighted with italic bold. Rows are sorted in descending order based on the number of parameters of models (not scores). The test set is used for evaluation of each dataset unless noted otherwise.
Instruction-following
Instruction level strict accuracy is reported for IFEval-et.
Model (# parameters ↓) | IFEval-et* |
---|---|
moonshotai/Kimi-K2-Instruct | 0.7891 |
deepseek-ai/DeepSeek-V3-0324 | 0.7171 |
meta-llama/Llama-3.1-405B-Instruct | 0.7159 |
meta-llama/Llama-3.3-70B-Instruct | 0.7705 |
Qwen/Qwen2.5-72B-Instruct | 0.7407 |
google/gemma-3-27b-it | 0.7655 |
utter-project/EuroLLM-9B-Instruct | 0.5397 |
meta-llama/Llama-3.1-8B-Instruct | 0.3797 |
tartuNLP/llama-estlm-prototype-0825 | 0.5174 |
BSC-LT/salamandra-7b-instruct | 0.5195 |
tartuNLP/Llammas | 0.3524 |
Qwen/Qwen2.5-7B-Instruct | 0.4988 |
Multiple Choice
All datasets except Winogrande-et are evaluated in 0-shot mode. Winogrande-et is evaluated in 3-shot mode. Exact match accuracy is reported for every dataset.
Model (# parameters ↓) | Winogrande-et | Trivia-et | Grammar-et | Inflection-et | Word-Meanings-et |
---|---|---|---|---|---|
moonshotai/Kimi-K2-Instruct | 0.8138 | 0.4225 | 0.916 | 0.6458 | 0.9689 |
deepseek-ai/DeepSeek-V3-0324 | 0.8042 | 0.27 | 0.364 | 0 | 0 |
meta-llama/Llama-3.1-405B-Instruct | 0.7878 | 0.4713 | 0.818 | 0.9089 | 0.9438 |
meta-llama/Llama-3.3-70B-Instruct | 0.7397 | 0.3875 | 0.797 | 0.6421 | 0.9408 |
Qwen/Qwen2.5-72B-Instruct | 0.7227 | 0.315 | 0.694 | 0.5208 | 0.9057 |
google/gemma-3-27b-it | 0.7510 | 0.325 | 0.817 | 0.5934 | 0.9529 |
utter-project/EuroLLM-9B-Instruct | 0.5846 | 0.3738 | 0.764 | 0.367 | 0.9258 |
meta-llama/Llama-3.1-8B-Instruct | 0.5399 | 0.2888 | 0.657 | 0.4165 | 0.8335 |
tartuNLP/llama-estlm-prototype-0825 | 0.5812 | 0.425 | 0.692 | 0.5188 | 0.9569 |
BSC-LT/salamandra-7b-instruct | 0.2878 | 0.2875 | 0.594 | 0.2668 | 0.8084 |
Qwen/Qwen2.5-7B-Instruct | 0.5473 | 0.2938 | 0.598 | 0.4136 | 0.7984 |
tartuNLP/Llammas | 0.5037 | 0.2838 | 0.529 | 0.2289 | 0.5326 |
Translation
English to Estonian
Model | wmt24pp (BLEU ↑) |
---|---|
BSC-LT/salamandraTA-7b-instruct | 0.2713 |
tartuNLP/llama-estlm-prototype-0825 | 0.264 |
utter-project/EuroLLM-9B-Instruct | 0.2602 |
tartuNLP/Llammas | 0.1472 |
meta-llama/Llama-3.1-8B-Instruct | 0.1406 |
BSC-LT/salamandra-7b-instruct | 0.1201 |
Qwen/Qwen2.5-7B-Instruct | 0.0476 |
Limitations
This is an early prototype version. Accordignly, it has limitations in addition to the base Llama limitations:
- Relatively short context of 4096 tokens. It's not expected to perform well on context sizes beyond that.
- Multi-turn conversations are not supported in this version.
- Trained with the original Llama 3.1 system prompt that has a hard-coded date cut-off.
Citation
TBA