stelterlab commited on
Commit
f422ee8
·
verified ·
1 Parent(s): cfe7eb5

Update README.md

Browse files

added original model card - added quantization note

Files changed (1) hide show
  1. README.md +137 -3
README.md CHANGED
@@ -1,3 +1,137 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - de
6
+ - es
7
+ - fr
8
+ - it
9
+ - pt
10
+ - pl
11
+ - nl
12
+ - tr
13
+ - sv
14
+ - cs
15
+ - el
16
+ - hu
17
+ - ro
18
+ - fi
19
+ - uk
20
+ - sl
21
+ - sk
22
+ - da
23
+ - lt
24
+ - lv
25
+ - et
26
+ - bg
27
+ - 'no'
28
+ - ca
29
+ - hr
30
+ - ga
31
+ - mt
32
+ - gl
33
+ - zh
34
+ - ru
35
+ - ko
36
+ - ja
37
+ - ar
38
+ - hi
39
+ library_name: transformers
40
+ base_model:
41
+ - utter-project/EuroLLM-9B-Instruct
42
+ ---
43
+
44
+ AWQ quantization: done by stelterlab in INT4 GEMM with AutoAWQ by casper-hansen (https://github.com/casper-hansen/AutoAWQ/)
45
+
46
+ Original Weights by the utter-project. Original Model Card follows:
47
+
48
+ # Model Card for EuroLLM-9B-Instruct
49
+
50
+ This is the model card for EuroLLM-9B-Instruct. You can also check the pre-trained version: [EuroLLM-9B](https://huggingface.co/utter-project/EuroLLM-9B).
51
+
52
+ - **Developed by:** Unbabel, Instituto Superior Técnico, Instituto de Telecomunicações, University of Edinburgh, Aveni, University of Paris-Saclay, University of Amsterdam, Naver Labs, Sorbonne Université.
53
+ - **Funded by:** European Union.
54
+ - **Model type:** A 9B parameter multilingual transfomer LLM.
55
+ - **Language(s) (NLP):** Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Irish, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish, Swedish, Arabic, Catalan, Chinese, Galician, Hindi, Japanese, Korean, Norwegian, Russian, Turkish, and Ukrainian.
56
+ - **License:** Apache License 2.0.
57
+
58
+ ## Model Details
59
+
60
+ The EuroLLM project has the goal of creating a suite of LLMs capable of understanding and generating text in all European Union languages as well as some additional relevant languages.
61
+ EuroLLM-9B is a 9B parameter model trained on 4 trillion tokens divided across the considered languages and several data sources: Web data, parallel data (en-xx and xx-en), and high-quality datasets.
62
+ EuroLLM-9B-Instruct was further instruction tuned on EuroBlocks, an instruction tuning dataset with focus on general instruction-following and machine translation.
63
+
64
+
65
+ ### Model Description
66
+
67
+ EuroLLM uses a standard, dense Transformer architecture:
68
+ - We use grouped query attention (GQA) with 8 key-value heads, since it has been shown to increase speed at inference time while maintaining downstream performance.
69
+ - We perform pre-layer normalization, since it improves the training stability, and use the RMSNorm, which is faster.
70
+ - We use the SwiGLU activation function, since it has been shown to lead to good results on downstream tasks.
71
+ - We use rotary positional embeddings (RoPE) in every layer, since these have been shown to lead to good performances while allowing the extension of the context length.
72
+
73
+ For pre-training, we use 400 Nvidia H100 GPUs of the Marenostrum 5 supercomputer, training the model with a constant batch size of 2,800 sequences, which corresponds to approximately 12 million tokens, using the Adam optimizer, and BF16 precision.
74
+ Here is a summary of the model hyper-parameters:
75
+ | | |
76
+ |--------------------------------------|----------------------|
77
+ | Sequence Length | 4,096 |
78
+ | Number of Layers | 42 |
79
+ | Embedding Size | 4,096 |
80
+ | FFN Hidden Size | 12,288 |
81
+ | Number of Heads | 32 |
82
+ | Number of KV Heads (GQA) | 8 |
83
+ | Activation Function | SwiGLU |
84
+ | Position Encodings | RoPE (\Theta=10,000) |
85
+ | Layer Norm | RMSNorm |
86
+ | Tied Embeddings | No |
87
+ | Embedding Parameters | 0.524B |
88
+ | LM Head Parameters | 0.524B |
89
+ | Non-embedding Parameters | 8.105B |
90
+ | Total Parameters | 9.154B |
91
+
92
+ ## Run the model
93
+
94
+ from transformers import AutoModelForCausalLM, AutoTokenizer
95
+
96
+ model_id = "utter-project/EuroLLM-9B-Instruct"
97
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
98
+ model = AutoModelForCausalLM.from_pretrained(model_id)
99
+
100
+ messages = [
101
+ {
102
+ "role": "system",
103
+ "content": "You are EuroLLM --- an AI assistant specialized in European languages that provides safe, educational and helpful answers.",
104
+ },
105
+ {
106
+ "role": "user", "content": "What is the capital of Portugal? How would you describe it?"
107
+ },
108
+ ]
109
+
110
+ inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
111
+ outputs = model.generate(inputs, max_new_tokens=1024)
112
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
113
+
114
+ ## Results
115
+
116
+ ### EU Languages
117
+
118
+
119
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/63f33ecc0be81bdc5d903466/ob_1sLM8c7dxuwpv6AAHA.png)
120
+ **Table 1:** Comparison of open-weight LLMs on multilingual benchmarks. The borda count corresponds to the average ranking of the models (see ([Colombo et al., 2022](https://arxiv.org/abs/2202.03799))). For Arc-challenge, Hellaswag, and MMLU we are using Okapi datasets ([Lai et al., 2023](https://aclanthology.org/2023.emnlp-demo.28/)) which include 11 languages. For MMLU-Pro and MUSR we translate the English version with Tower ([Alves et al., 2024](https://arxiv.org/abs/2402.17733)) to 6 EU languages.
121
+ \* As there are no public versions of the pre-trained models, we evaluated them using the post-trained versions.
122
+
123
+ The results in Table 1 highlight EuroLLM-9B's superior performance on multilingual tasks compared to other European-developed models (as shown by the Borda count of 1.0), as well as its strong competitiveness with non-European models, achieving results comparable to Gemma-2-9B and outperforming the rest on most benchmarks.
124
+
125
+ ### English
126
+
127
+
128
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/63f33ecc0be81bdc5d903466/EfilsW_p-JA13mV2ilPkm.png)
129
+
130
+ **Table 2:** Comparison of open-weight LLMs on English general benchmarks.
131
+ \* As there are no public versions of the pre-trained models, we evaluated them using the post-trained versions.
132
+
133
+ The results in Table 2 demonstrate EuroLLM's strong performance on English tasks, surpassing most European-developed models and matching the performance of Mistral-7B (obtaining the same Borda count).
134
+
135
+ ## Bias, Risks, and Limitations
136
+
137
+ EuroLLM-9B has not been aligned to human preferences, so the model may generate problematic outputs (e.g., hallucinations, harmful content, or false statements).