Towards Generalist Robot Policies: What Matters in Building Vision-Language-Action Models Paper • 2412.14058 • Published Dec 18, 2024 • 1
MM-RLHF: The Next Step Forward in Multimodal LLM Alignment Paper • 2502.10391 • Published Feb 14 • 35
BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models Paper • 2506.07961 • Published Jun 9 • 12
Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation Paper • 2412.14015 • Published Dec 18, 2024 • 12
GenSim2: Scaling Robot Data Generation with Multi-modal and Reasoning LLMs Paper • 2410.03645 • Published Oct 4, 2024 • 3