Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs Paper • 2505.04519 • Published 17 days ago • 2 • 1
ReplaceMe: Network Simplification via Layer Pruning and Linear Transformations Paper • 2505.02819 • Published 19 days ago • 24 • 4
Aleph-Alpha-GermanWeb: Improving German-language LLM pre-training with model-based data curation and synthetic data generation Paper • 2505.00022 • Published about 1 month ago • 1 • 1
Evaluating the Quality of Benchmark Datasets for Low-Resource Languages: A Case Study on Turkish Paper • 2504.09714 • Published Apr 13 • 1
ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance Paper • 2504.08716 • Published Apr 11 • 10 • 3
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper • 2412.13663 • Published Dec 18, 2024 • 150 • 16
EuroBERT: Scaling Multilingual Encoders for European Languages Paper • 2503.05500 • Published Mar 7 • 80 • 9
EuroBERT: Scaling Multilingual Encoders for European Languages Paper • 2503.05500 • Published Mar 7 • 80 • 9