Masader: Metadata Sourcing for Arabic Text and Speech Data Resources Paper • 2110.06744 • Published Oct 13, 2021
Rephrasing natural text data with different languages and quality levels for Large Language Model pre-training Paper • 2410.20796 • Published Oct 28, 2024
Ashaar: Automatic Analysis and Generation of Arabic Poetry Using Deep Learning Approaches Paper • 2307.06218 • Published Jul 12, 2023
MOLE: Metadata Extraction and Validation in Scientific Papers Using LLMs Paper • 2505.19800 • Published 3 days ago • 1
RedPajama: an Open Dataset for Training Large Language Models Paper • 2411.12372 • Published Nov 19, 2024 • 56
Distributed Methods with Compressed Communication for Solving Variational Inequalities, with Theoretical Guarantees Paper • 2110.03313 • Published Oct 7, 2021 • 1
Distributed Methods with Compressed Communication for Solving Variational Inequalities, with Theoretical Guarantees Paper • 2110.03313 • Published Oct 7, 2021 • 1
SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient Paper • 2301.11913 • Published Jan 27, 2023 • 1
A critical look at the evaluation of GNNs under heterophily: Are we really making progress? Paper • 2302.11640 • Published Feb 22, 2023 • 1
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model Paper • 2211.05100 • Published Nov 9, 2022 • 32
Petals: Collaborative Inference and Fine-tuning of Large Models Paper • 2209.01188 • Published Sep 2, 2022 • 1
Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding Paper • 2402.12374 • Published Feb 19, 2024 • 3
The Hallucinations Leaderboard -- An Open Effort to Measure Hallucinations in Large Language Models Paper • 2404.05904 • Published Apr 8, 2024 • 9