Submitted by tellarin 66 Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia · 92 authors 1
Submitted by ColeYzzzz 47 LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL · 10 authors 2
Submitted by a43992899 42 YuE: Scaling Open Foundation Models for Long-Form Music Generation · 57 authors 1
Submitted by Xuerui123 25 UniF^2ace: Fine-grained Face Understanding and Generation with Unified Multimodal Models · 8 authors 2
Submitted by Owen777 25 MagicInfinite: Generating Infinite Talking Videos with Your Words and Voice · 13 authors 1
Submitted by Z-MU-Z 22 SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories · 8 authors 1
Submitted by wujie10 22 Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model · 28 authors 1
Submitted by hsaest 16 Implicit Reasoning in Transformers is Reasoning through Shortcuts · 4 authors 1
Submitted by Harold328 14 LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization · 11 authors 1
Submitted by subin-kim 14 Tuning-Free Multi-Event Long Video Generation via Synchronized Coupled Sampling · 5 authors 1
Submitted by CohenQu 14 Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning · 7 authors 1
Submitted by LegendBC 12 OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models · 5 authors 1
Submitted by Jianxiong 11 CineBrain: A Large-Scale Multi-Modal Brain Dataset During Naturalistic Audiovisual Narrative Processing · 5 authors 1
Submitted by MaverickAlex 7 ^RFLAV: Rolling Flow matching for infinite Audio Video generation · 7 authors 1
Submitted by KID-22 5 Perplexity Trap: PLM-Based Retrievers Overrate Low Perplexity Documents · 9 authors 1
Submitted by XinXuNLPer 5 BiasEdit: Debiasing Stereotyped Language Models via Model Editing · 4 authors 1
Submitted by RohamKoohestani 5 Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Enhancement Protocol · 3 authors 1
Submitted by Jinfa 4 QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension · 11 authors 1
Submitted by XiaXin-Aloys 4 RayFlow: Instance-Aware Diffusion Acceleration via Adaptive Flow Trajectories · 6 authors 1
Submitted by kwanY 3 AnyMoLe: Any Character Motion In-betweening Leveraging Video Diffusion Models · 4 authors 1
Submitted by luoyingfeng 2 Beyond Decoder-only: Large Language Models Can be Good Encoders for Machine Translation · 11 authors 1
Submitted by WYLing 2 VisualSimpleQA: A Benchmark for Decoupled Evaluation of Large Vision-Language Models in Fact-Seeking Question Answering · 10 authors 1