GeistBERT-Longformer

GeistBERT-Longformer is a German language model designed for long-context NLP tasks. It extends GeistBERT by integrating the Longformer self-attention mechanism, allowing for significantly longer sequence processing while maintaining efficiency.

This variant is particularly well-suited for:

  • Document-level tasks such as legal text analysis, summarization, and passage retrieval.
  • Tasks requiring extended context windows beyond traditional transformer limits.

Key Features:

  • Sliding-window attention: Efficient self-attention mechanism that scales to longer sequences.
  • Extended context length: Allows processing of larger text spans compared to standard BERT/RoBERTa.
  • Optimized for German: Pre-trained on a for the most part deduplicated German corpus (OSCAR23, OPUS, MC4).

Compared to Nyströmformer and standard RoBERTa, GeistBERT-Longformer requires significantly more VRAM, often necessitating multi-GPU training with gradient accumulation for large batch sizes.

For more details, see GeistBERT on Hugging Face.

Downloads last month
869
Safetensors
Model size
153M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for GeistBERT/GeistBERT_base_longformer

Finetuned
(2)
this model