Submitted by xiaol 23 ARWKV: Pretrain is not what we need, an RNN-Attention-Based Language Model Born from Transformer · 4 authors 2
Submitted by HarryHe 15 Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation · 14 authors 2
Submitted by eliebak 10 Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models · 6 authors 2
Submitted by akhaliq 10 iFormer: Integrating ConvNet and Transformer for Mobile Application · 1 authors 2
Submitted by nielsr 9 Are Vision Language Models Texture or Shape Biased and Can We Steer Them? · 8 authors 2
Submitted by akhaliq 7 Mixture-of-Mamba: Enhancing Multi-Modal State-Space Models with Modality-Aware Sparsity · 6 authors 1
Submitted by xywang1 6 OpenCharacter: Training Customizable Role-Playing LLMs with Large-Scale Synthetic Personas · 6 authors 2
Submitted by melfeki11 5 Return of the Encoder: Maximizing Parameter Efficiency for SLMs · 3 authors 2