Mechanistic Understanding and Mitigation of Language Confusion in English-Centric Large Language Models Paper • 2505.16538 • Published 3 days ago • 2
Tracing Multilingual Factual Knowledge Acquisition in Pretraining Paper • 2505.14824 • Published 4 days ago • 3
Absolute Zero: Reinforced Self-play Reasoning with Zero Data Paper • 2505.03335 • Published 19 days ago • 160
ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning Paper • 2111.10952 • Published Nov 22, 2021 • 2
Eynollah models Collection Eynollah models for document image processing and layout analysis tasks. • 14 items • Updated Mar 27 • 3
Large Means Left: Political Bias in Large Language Models Increases with Their Number of Parameters Paper • 2505.04393 • Published 18 days ago • 1
Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs Paper • 2505.04519 • Published 17 days ago • 2
ReplaceMe: Network Simplification via Layer Pruning and Linear Transformations Paper • 2505.02819 • Published 19 days ago • 24
Aleph-Alpha-GermanWeb: Improving German-language LLM pre-training with model-based data curation and synthetic data generation Paper • 2505.00022 • Published about 1 month ago • 1
Efficient Domain-adaptive Continual Pretraining for the Process Industry in the German Language Paper • 2504.19856 • Published 26 days ago • 1
Even Small Reasoners Should Quote Their Sources: Introducing the Pleias-RAG Model Family Paper • 2504.18225 • Published 30 days ago • 12
A Post-trainer's Guide to Multilingual Training Data: Uncovering Cross-lingual Transfer Dynamics Paper • 2504.16677 • Published Apr 23 • 1
ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance Paper • 2504.08716 • Published Apr 11 • 10
OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens Paper • 2504.07096 • Published Apr 9 • 73
Encoder-Decoder Gemma: Improving the Quality-Efficiency Trade-Off via Adaptation Paper • 2504.06225 • Published Apr 8 • 1