DeCRED: Decoder-Centric Regularization for Encoder-Decoder Based Speech Recognition Paper • 2508.08938 • Published 4 days ago • 8
Speech-to-LaTeX: New Models and Datasets for Converting Spoken Equations and Sentences Paper • 2508.03542 • Published 11 days ago • 4
Understanding the Disharmony between Dropout and Batch Normalization by Variance Shift Paper • 1801.05134 • Published Jan 16, 2018 • 1
CMIC: Content-Adaptive Mamba for Learned Image Compression Paper • 2508.02192 • Published 12 days ago • 1
Keep It Real: Challenges in Attacking Compression-Based Adversarial Purification Paper • 2508.05489 • Published 9 days ago • 1
SpectroStream: A Versatile Neural Codec for General Audio Paper • 2508.05207 • Published 9 days ago • 1
SONAR-LLM: Autoregressive Transformer that Thinks in Sentence Embeddings and Speaks in Tokens Paper • 2508.05305 • Published 9 days ago • 37
Hidden Dynamics of Massive Activations in Transformer Training Paper • 2508.03616 • Published 11 days ago • 17
Adapting Vision-Language Models Without Labels: A Comprehensive Survey Paper • 2508.05547 • Published 9 days ago • 10
Steering One-Step Diffusion Model with Fidelity-Rich Decoder for Fast Image Compression Paper • 2508.04979 • Published 9 days ago • 5
MOSEv2: A More Challenging Dataset for Video Object Segmentation in Complex Scenes Paper • 2508.05630 • Published 9 days ago • 9
Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation Paper • 2508.05635 • Published 9 days ago • 67
Position: The Current AI Conference Model is Unsustainable! Diagnosing the Crisis of Centralized AI Conference Paper • 2508.04586 • Published 10 days ago • 12
Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval Paper • 2507.23284 • Published 16 days ago • 3
Representation Shift: Unifying Token Compression with FlashAttention Paper • 2508.00367 • Published 15 days ago • 15
Skywork UniPic: Unified Autoregressive Modeling for Visual Understanding and Generation Paper • 2508.03320 • Published 11 days ago • 59
A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models Paper • 2508.01548 • Published 13 days ago • 12
Dynaword: From One-shot to Continuously Developed Datasets Paper • 2508.02271 • Published 12 days ago • 13
VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo Paper • 2508.02317 • Published 12 days ago • 15