aehrm commited on
Commit
171488e
·
verified ·
1 Parent(s): bea7a0d

Update README.md

Browse files

feel free to propose further changes to this draft

Files changed (1) hide show
  1. README.md +65 -5
README.md CHANGED
@@ -14,19 +14,79 @@ tags:
14
 
15
  # ModernGBERT 1B
16
 
17
- This is a German ModernBERT 1B language model trained from scratch using the ModernBERT [codebase](https://github.com/AnswerDotAI/ModernBERT) and the same German portion of [RedPajama V2](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-V2) as our [LLäMmlein](https://huggingface.co/collections/LSX-UniWue/llammlein-6732ff41f3705c686e605762) family.
 
 
 
 
 
 
 
 
 
 
18
  Find more details in our [preprint](https://arxiv.org/abs/2505.13136)!
19
 
 
20
  ### Usage
21
 
 
 
 
 
 
 
 
22
  ```python
23
- from transformers import AutoModel, AutoTokenizer
24
 
25
- model = AutoModel.from_pretrained("LSX-UniWue/ModernGBERT_1B")
 
 
26
 
27
- tokenizer = AutoTokenizer.from_pretrained("LSX-UniWue/ModernGBERT_1B")
 
 
 
 
 
 
 
 
 
28
  ```
29
 
 
 
 
 
 
 
 
 
 
 
 
 
30
 
31
  ### Performance
32
- We evaluated our model on the [SuperGLEBer](https://lsx-uniwue.github.io/SuperGLEBer-site/) benchmark.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
  # ModernGBERT 1B
16
 
17
+ ModernGBERT 1B is a German ModernBERT language model with 1 billion parameters and a native context length of up to 8,192 tokens. This model follows the same BERT-style architecture and training procedure as the ModernBERT [codebase](https://github.com/AnswerDotAI/ModernBERT).
18
+ ModernGBERT 1B has been pre-trained on the same 1.27 trillion tokens from the German portion of [RedPajama V2](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-V2) as our [LLäMmlein](https://huggingface.co/collections/LSX-UniWue/llammlein-6732ff41f3705c686e605762) decoder family.
19
+
20
+ We provide two model sizes:
21
+
22
+ * [ModernGBERT 1B](https://huggingface.co/LSX-UniWue/ModernGBERT_1B) ← You are here
23
+ 28 layers, hidden size 2,048, 1 billion parameters
24
+
25
+ * [ModernGBERT 134M](https://huggingface.co/LSX-UniWue/ModernGBERT_134M)
26
+ 22 layers, hidden size 768, 134 million parameters
27
+
28
  Find more details in our [preprint](https://arxiv.org/abs/2505.13136)!
29
 
30
+
31
  ### Usage
32
 
33
+ You can use ModernGBERT with the `transformers` library from version v4.48.0 onwards.
34
+ (Optional: install `flash-attn` to achieve highest efficiency.)
35
+
36
+ Since ModernGBERT 1B is a Masked Language Model (MLM), you can load it via `AutoModelForMaskedLM`. For downstream tasks such as classification, retrieval, or QA, fine-tune the model by following standard BERT fine-tuning recipes.
37
+
38
+ Example using `AutoModelForMaskedLM`:
39
+
40
  ```python
41
+ from transformers import AutoTokenizer, AutoModelForMaskedLM
42
 
43
+ model_id = "LSX-UniWue/ModernGBERT_1B"
44
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
45
+ model = AutoModelForMaskedLM.from_pretrained(model_id)
46
 
47
+ text = "Die Hauptstadt von Frankreich ist [MASK]."
48
+ inputs = tokenizer(text, return_tensors="pt")
49
+ outputs = model(**inputs)
50
+
51
+ # To get predictions for the mask:
52
+ masked_index = inputs["input_ids"][0].tolist().index(tokenizer.mask_token_id)
53
+ predicted_token_id = outputs.logits[0, masked_index].argmax(axis=-1)
54
+ predicted_token = tokenizer.decode(predicted_token_id)
55
+ print("Predicted token:", predicted_token)
56
+ # Predicted token: Paris
57
  ```
58
 
59
+ **NOTE:** If you want to use HuggingFace's PEFT library for LoRA training, you need to specify the target modules, e.g.:
60
+
61
+ ```python
62
+ from peft import LoraConfig, get_peft_model
63
+ peft_config = LoraConfig(
64
+ task_type="TOKEN_CLS", r=8, lora_alpha=32,
65
+ target_modules=["Wqkv", "Wi", "Wo"],
66
+ )
67
+ model = get_peft_model(model, peft_config)
68
+ ```
69
+
70
+
71
 
72
  ### Performance
73
+ We evaluate our models across a broad range of tasks. For natural language understanding, we use the [SuperGLEBer](https://lsx-uniwue.github.io/SuperGLEBer-site/) benchmark, and for embedding capabilities, we use the [German MTEB](http://mteb-leaderboard.hf.space/?benchmark_name=MTEB%28deu%2C+v1%29) benchmark (after unsupervised fine-tuning of every model on the German mMARCO portion). The following table provides a comparison of this encoder with other German and multilingual encoders. See our [preprint](https://arxiv.org/abs/2505.13136) for more details about the evaluation.
74
+
75
+ | Model | SuperGLEBer Avg | MTEB Avg |
76
+ |----------------------------------|-----------------|-----------|
77
+ | ModernGBERT 1B<br>(you are here) | **0.808** | **0.551** |
78
+ | ModernGBERT 134M | 0.749 | 0.501 |
79
+ | GBERT-base | 0.718 | 0.500 |
80
+ | GBERT-large | 0.768 | 0.521 |
81
+ | GeBERTa-base | 0.716 | 0.493 |
82
+ | GeBERTa-large | 0.749 | 0.494 |
83
+ | GeBERTa-xlarge | 0.767 | 0.521 |
84
+ | Gerturax-3 | 0.740 | 0.472 |
85
+ | XLM-RoBERTa-large | 0.730 | 0.460 |
86
+ | XLM-RoBERTa-xlarge | 0.758 | 0.479 |
87
+
88
+
89
+
90
+ ### License
91
+
92
+ We release the ModernGBERT models under a research-only RAIL-M license. See [license.md](./license.md) for details.