view article Article CodeAgents + Structure: A Better Way to Execute Actions By akseljoonas and 1 other • 11 days ago • 45
LongAttn: Selecting Long-context Training Data via Token-level Attention Paper • 2502.16860 • Published Feb 24 • 1
view article Article wHy DoNt YoU jUsT uSe ThE lLaMa ToKeNiZeR?? By catherinearnett • Sep 27, 2024 • 46
CLIPPER Collection Models and datasets for CLIPPER: Compression enables long-context synthetic data generation • 6 items • Updated Feb 20 • 3
view article Article Deploying Your FastAPI Applications on Huggingface Via Docker By HemanthSai7 • Dec 11, 2023 • 32
Towards the Law of Capacity Gap in Distilling Language Models Paper • 2311.07052 • Published Nov 13, 2023 • 2
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4 • 232
view article Article From PyTorch DDP to 🤗 Accelerate to 🤗 Trainer, mastery of distributed training with ease By muellerzr • Oct 21, 2022 • 31
Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders Paper • 2410.22366 • Published Oct 28, 2024 • 83
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order Paper • 2404.00399 • Published Mar 30, 2024 • 43