JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse Paper • 2503.16365 • Published Mar 20 • 41
OpenX-LeRobot Collection Open X-Embodiment datasets in LeRobot format with standard transfomation (https://github.com/Tavish9/any4lerobot) • 34 items • Updated 6 days ago • 14
view article Article π0 and π0-FAST: Vision-Language-Action Models for General Robot Control By danaaubakirova and 3 others • Feb 4 • 167
Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution Paper • 2312.06640 • Published Dec 11, 2023 • 48
PLEX: Making the Most of the Available Data for Robotic Manipulation Pretraining Paper • 2303.08789 • Published Mar 15, 2023 • 1
Star Attention: Efficient LLM Inference over Long Sequences Paper • 2411.17116 • Published Nov 26, 2024 • 55
MagicDriveDiT: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control Paper • 2411.13807 • Published Nov 21, 2024 • 11
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games Paper • 2411.13543 • Published Nov 20, 2024 • 18
How Far is Video Generation from World Model: A Physical Law Perspective Paper • 2411.02385 • Published Nov 4, 2024 • 35
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation Paper • 2411.04709 • Published Nov 5, 2024 • 27
JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation Paper • 2411.07975 • Published Nov 12, 2024 • 31
EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation Paper • 2411.08380 • Published Nov 13, 2024 • 27
LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation Paper • 2411.04997 • Published Nov 7, 2024 • 40
Adaptive Caching for Faster Video Generation with Diffusion Transformers Paper • 2411.02397 • Published Nov 4, 2024 • 24