Submitted by Elizaveta 70 When Less is Enough: Adaptive Token Reduction for Efficient Image Representation · 3 authors 2
Submitted by VentureZJ 52 MAPS: A Multi-Agent Framework Based on Big Seven Personality and Socratic Guidance for Multimodal Scientific Problem Solving · 9 authors 2
Submitted by VentureZJ 43 MARS: A Multi-Agent Framework Incorporating Socratic Guidance for Automated Prompt Optimization · 6 authors 2
Submitted by IranQin 39 RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints · 8 authors 2
Submitted by Epiphqny 34 Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation · 7 authors 4
Submitted by akhaliq 33 Modifying Large Language Model Post-Training for Diverse Creative Writing · 5 authors 2
Submitted by akhaliq 23 TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting · 7 authors 2
Submitted by ydeng9 21 OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement · 6 authors 2
Submitted by JacobYuan 13 MathFlow: Enhancing the Perceptual Flow of MLLMs for Visual Mathematical Problems · 8 authors 3
Submitted by Guan123 11 ETVA: Evaluation of Text-to-Video Alignment via Fine-grained Question Generation and Answering · 8 authors 2
Submitted by hitsmy 9 From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration · 4 authors 2
Submitted by akhaliq 8 FastCuRL: Curriculum Reinforcement Learning with Progressive Context Extension for Efficient Training R1-like Reasoning Models · 7 authors 3
Submitted by ChengmingX 6 When Preferences Diverge: Aligning Diffusion Models with Minority-Aware Adaptive DPO · 8 authors 2
Submitted by ZhaochongAn 5 Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model · 7 authors 2