LSX-UniWue
/

ModernGBERT_1B

@@ -14,19 +14,79 @@ tags:
 # ModernGBERT 1B
-This is a German ModernBERT 1B language model trained from scratch using the ModernBERT [codebase](https://github.com/AnswerDotAI/ModernBERT) and the same German portion of [RedPajama V2](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-V2) as our [LLäMmlein](https://huggingface.co/collections/LSX-UniWue/llammlein-6732ff41f3705c686e605762) family.
 Find more details in our [preprint](https://arxiv.org/abs/2505.13136)!
 ### Usage
 ```python
-from transformers import AutoModel, AutoTokenizer
-model = AutoModel.from_pretrained("LSX-UniWue/ModernGBERT_1B")
-tokenizer = AutoTokenizer.from_pretrained("LSX-UniWue/ModernGBERT_1B")
 ```
 ### Performance
-We evaluated our model on the [SuperGLEBer](https://lsx-uniwue.github.io/SuperGLEBer-site/) benchmark.

 # ModernGBERT 1B
+ModernGBERT 1B is a German ModernBERT language model with 1 billion parameters and a native context length of up to 8,192 tokens. This model follows the same BERT-style architecture and training procedure as the ModernBERT [codebase](https://github.com/AnswerDotAI/ModernBERT).
+ModernGBERT 1B has been pre-trained on the same 1.27 trillion tokens from the German portion of [RedPajama V2](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-V2) as our [LLäMmlein](https://huggingface.co/collections/LSX-UniWue/llammlein-6732ff41f3705c686e605762) decoder family.
+We provide two model sizes:
+* [ModernGBERT 1B](https://huggingface.co/LSX-UniWue/ModernGBERT_1B) ← You are here
+  28 layers, hidden size 2,048, 1 billion parameters
+* [ModernGBERT 134M](https://huggingface.co/LSX-UniWue/ModernGBERT_134M)
+  22 layers, hidden size 768, 134 million parameters
 Find more details in our [preprint](https://arxiv.org/abs/2505.13136)!
 ### Usage
+You can use ModernGBERT with the `transformers` library from version v4.48.0 onwards.
+(Optional: install `flash-attn` to achieve highest efficiency.)
+Since ModernGBERT 1B is a Masked Language Model (MLM), you can load it via `AutoModelForMaskedLM`. For downstream tasks such as classification, retrieval, or QA, fine-tune the model by following standard BERT fine-tuning recipes.
+Example using `AutoModelForMaskedLM`:
 ```python
+from transformers import AutoTokenizer, AutoModelForMaskedLM
+model_id = "LSX-UniWue/ModernGBERT_1B"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForMaskedLM.from_pretrained(model_id)
+text = "Die Hauptstadt von Frankreich ist [MASK]."
+inputs = tokenizer(text, return_tensors="pt")
+outputs = model(**inputs)
+# To get predictions for the mask:
+masked_index = inputs["input_ids"][0].tolist().index(tokenizer.mask_token_id)
+predicted_token_id = outputs.logits[0, masked_index].argmax(axis=-1)
+predicted_token = tokenizer.decode(predicted_token_id)
+print("Predicted token:", predicted_token)
+# Predicted token:  Paris
 ```
+**NOTE:** If you want to use HuggingFace's PEFT library for LoRA training, you need to specify the target modules, e.g.:
+```python
+from peft import LoraConfig, get_peft_model
+peft_config = LoraConfig(
+    task_type="TOKEN_CLS", r=8, lora_alpha=32,
+    target_modules=["Wqkv", "Wi", "Wo"],
+)
+model = get_peft_model(model, peft_config)
+```
 ### Performance
+We evaluate our models across a broad range of tasks. For natural language understanding, we use the [SuperGLEBer](https://lsx-uniwue.github.io/SuperGLEBer-site/) benchmark, and for embedding capabilities, we use the [German MTEB](http://mteb-leaderboard.hf.space/?benchmark_name=MTEB%28deu%2C+v1%29) benchmark (after unsupervised fine-tuning of every model on the German mMARCO portion). The following table provides a comparison of this encoder with other German and multilingual encoders. See our [preprint](https://arxiv.org/abs/2505.13136) for more details about the evaluation.
+| Model                            | SuperGLEBer Avg | MTEB Avg  |
+|----------------------------------|-----------------|-----------|
+| ModernGBERT 1B<br>(you are here) | **0.808**       | **0.551** |
+| ModernGBERT 134M                 | 0.749           | 0.501     |
+| GBERT-base                       | 0.718           | 0.500     |
+| GBERT-large                      | 0.768           | 0.521     |
+| GeBERTa-base                     | 0.716           | 0.493     |
+| GeBERTa-large                    | 0.749           | 0.494     |
+| GeBERTa-xlarge                   | 0.767           | 0.521     |
+| Gerturax-3                       | 0.740           | 0.472     |
+| XLM-RoBERTa-large                | 0.730           | 0.460     |
+| XLM-RoBERTa-xlarge               | 0.758           | 0.479     |
+### License
+We release the ModernGBERT models under a research-only RAIL-M license. See [license.md](./license.md) for details.