Shifting AI Efficiency From Model-Centric to Data-Centric Compression Paper • 2505.19147 • Published May 25 • 145
VARD: Efficient and Dense Fine-Tuning for Diffusion Models with Value-based RL Paper • 2505.15791 • Published May 21 • 5
VARD: Efficient and Dense Fine-Tuning for Diffusion Models with Value-based RL Paper • 2505.15791 • Published May 21 • 5 • 2
SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning Paper • 2505.12448 • Published May 18 • 10
SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning Paper • 2505.12448 • Published May 18 • 10 • 2
OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic Manipulation Paper • 2505.03912 • Published May 6 • 8
OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic Manipulation Paper • 2505.03912 • Published May 6 • 8 • 1
Unicorn: Text-Only Data Synthesis for Vision Language Model Training Paper • 2503.22655 • Published Mar 28 • 39
Exploring the Evolution of Physics Cognition in Video Generation: A Survey Paper • 2503.21765 • Published Mar 27 • 11
Exploring the Evolution of Physics Cognition in Video Generation: A Survey Paper • 2503.21765 • Published Mar 27 • 11
Exploring the Evolution of Physics Cognition in Video Generation: A Survey Paper • 2503.21765 • Published Mar 27 • 11 • 2
Accelerating Diffusion Transformers with Token-wise Feature Caching Paper • 2410.05317 • Published Oct 5, 2024
Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration Paper • 2411.17686 • Published Nov 26, 2024 • 21
CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction Paper • 2412.06782 • Published Dec 9, 2024 • 7
CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction Paper • 2412.06782 • Published Dec 9, 2024 • 7
CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction Paper • 2412.06782 • Published Dec 9, 2024 • 7 • 2
Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration Paper • 2411.17686 • Published Nov 26, 2024 • 21
Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration Paper • 2411.17686 • Published Nov 26, 2024 • 21 • 2
PiTe: Pixel-Temporal Alignment for Large Video-Language Model Paper • 2409.07239 • Published Sep 11, 2024 • 15