view article Article Mixture of Experts Explained +4 osanseviero, lewtun, philschmid, smangrul, ybelkada, pcuenq • Dec 11, 2023 • 1.13k
view article Article Mixture of Experts (MoEs) in Transformers +5 ariG23498, pcuenq, merve, IlyasMoutawwakil, ArthurZ, sergiopaniego, Molbap • Feb 26 • 164
view article Article Fine-Tune ViT for Image Classification with 🤗 Transformers nateraw • Feb 11, 2022 • 61
Concept-Guided Fine-Tuning: Steering ViTs away from Spurious Correlations to Improve Robustness Paper • 2603.08309 • Published Mar 9 • 12
Unsloth 4-bit Dynamic Quants Collection Unsloths Dynamic 4bit Quants selectively skips quantizing certain parameters; greatly improving accuracy while only using <10% more VRAM than BnB 4bit • 28 items • Updated Apr 22 • 96
view article Article Streaming datasets: 100x More Efficient +3 andito, lhoestq, burtenshaw, pcuenq, merve • Oct 27, 2025 • 86
A large-scale heterogeneous 3D magnetic resonance brain imaging dataset for self-supervised learning Paper • 2506.14432 • Published Jun 17, 2025 • 4