PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models Paper • 2504.16074 • Published 4 days ago • 29
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models Paper • 2504.11468 • Published 16 days ago • 27
BlockGaussian: Efficient Large-Scale Scene Novel View Synthesis via Adaptive Block-Based Gaussian Splatting Paper • 2504.09048 • Published 14 days ago • 7
SimpleAR: Pushing the Frontier of Autoregressive Visual Generation through Pretraining, SFT, and RL Paper • 2504.11455 • Published 11 days ago • 12
NormalCrafter: Learning Temporally Consistent Normals from Video Diffusion Priors Paper • 2504.11427 • Published 11 days ago • 17
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published 12 days ago • 241
Visual Chronicles: Using Multimodal LLMs to Analyze Massive Collections of Images Paper • 2504.08727 • Published 15 days ago • 11
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model Paper • 2504.08685 • Published 15 days ago • 121
OmniSVG: A Unified Scalable Vector Graphics Generation Model Paper • 2504.06263 • Published 18 days ago • 151
EvMic: Event-based Non-contact sound recovery from effective spatial-temporal modeling Paper • 2504.02402 • Published 23 days ago • 6
Articulated Kinematics Distillation from Video Diffusion Models Paper • 2504.01204 • Published 25 days ago • 24
SketchVideo: Sketch-based Video Generation and Editing Paper • 2503.23284 • Published 27 days ago • 23
TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through Task Tokenization Paper • 2503.19901 • Published Mar 25 • 39
MeshCraft: Exploring Efficient and Controllable Mesh Generation with Flow-based DiTs Paper • 2503.23022 • Published 28 days ago • 7
SparseFlex: High-Resolution and Arbitrary-Topology 3D Shape Modeling Paper • 2503.21732 • Published 30 days ago • 8
Hi3DGen: High-fidelity 3D Geometry Generation from Images via Normal Bridging Paper • 2503.22236 • Published 29 days ago • 11
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks Paper • 2503.21696 • Published 30 days ago • 22