MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections Paper • 2502.12170 • Published 25 days ago • 12
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers Paper • 2502.15007 • Published 17 days ago • 160
How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM? Paper • 2502.14502 • Published 18 days ago • 83
Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity Paper • 2502.13063 • Published 19 days ago • 65
Llama 3.2 Collection This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 • 15 items • Updated Dec 6, 2024 • 574
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation Paper • 2412.06531 • Published Dec 9, 2024 • 71
Qwen2.5 Collection Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 46 items • Updated 12 days ago • 552