TransMamba: Flexibly Switching between Transformer and Mamba Paper • 2503.24067 • Published 8 days ago • 13
MutaGReP: Execution-Free Repository-Grounded Plan Search for Code-Use Paper • 2502.15872 • Published Feb 21 • 5
Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam Paper • 2502.17055 • Published Feb 24 • 17
Slam Collection All resources for SpeechLMs from "Slamming: Training a Speech Language Model on One GPU in a Day". We provide tokeniser, lm, and datasets • 6 items • Updated Feb 25 • 13
Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization Paper • 2502.19261 • Published Feb 26 • 7
Continual Quantization-Aware Pre-Training: When to transition from 16-bit to 1.58-bit pre-training for BitNet language models? Paper • 2502.11895 • Published Feb 17 • 1
Hamanasu Collection A brand new series of Models from yours truly, Designed for Intelligence, Creativity and Roleplay - R/Locallama keeps DELETING MY GODDAMN COMMENTS • 31 items • Updated 2 days ago • 8
ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization Paper • 2502.02631 • Published Feb 4 • 3
Unlocking Efficient Large Inference Models: One-Bit Unrolling Tips the Scales Paper • 2502.01908 • Published Feb 4 • 1
QuEST: Stable Training of LLMs with 1-Bit Weights and Activations Paper • 2502.05003 • Published Feb 7 • 44
Why Does the Effective Context Length of LLMs Fall Short? Paper • 2410.18745 • Published Oct 24, 2024 • 18
Unbounded: A Generative Infinite Game of Character Life Simulation Paper • 2410.18975 • Published Oct 24, 2024 • 38