Efficient Domain-adaptive Continual Pretraining for the Process Industry in the German Language Paper • 2504.19856 • Published about 23 hours ago • 1
Even Small Reasoners Should Quote Their Sources: Introducing the Pleias-RAG Model Family Paper • 2504.18225 • Published 4 days ago • 6
A Post-trainer's Guide to Multilingual Training Data: Uncovering Cross-lingual Transfer Dynamics Paper • 2504.16677 • Published 6 days ago • 1
ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance Paper • 2504.08716 • Published 18 days ago • 10
OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens Paper • 2504.07096 • Published 20 days ago • 73
Encoder-Decoder Gemma: Improving the Quality-Efficiency Trade-Off via Adaptation Paper • 2504.06225 • Published 21 days ago • 1
Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models Paper • 2504.03624 • Published 25 days ago • 13
Boundless Byte Pair Encoding: Breaking the Pre-tokenization Barrier Paper • 2504.00178 • Published 29 days ago • 1
Multimodal LLMs for OCR, OCR Post-Correction, and Named Entity Recognition in Historical Documents Paper • 2504.00414 • Published 28 days ago • 1
Overcoming Vocabulary Constraints with Pixel-level Fallback Paper • 2504.02122 • Published 27 days ago • 2
E3C-Projected Collection This collection contains the projected datasets of English layer one of e3c into Greek, Italian, Polish, Slovak, and Slovenian • 11 items • Updated Jan 8 • 1
State Fourier Diffusion Language Model (SFDLM): A Scalable, Novel Iterative Approach to Language Modeling Paper • 2503.17382 • Published Mar 16 • 1
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model Paper • 2211.05100 • Published Nov 9, 2022 • 31
UniBERTs: Adversarial Training for Language-Universal Representations Paper • 2503.12608 • Published Mar 16 • 1
Do Construction Distributions Shape Formal Language Learning In German BabyLMs? Paper • 2503.11593 • Published Mar 14 • 1
HyperZcdotZcdotW Operator Connects Slow-Fast Networks for Full Context Interaction Paper • 2401.17948 • Published Jan 31, 2024 • 4