Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2501.03841

Cosmos World Foundation Model Platform for Physical AI

Paper • 2501.03575 • Published 22 days ago • 67
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

Paper • 2501.00599 • Published 28 days ago • 41
OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints

Paper • 2501.03841 • Published 21 days ago • 52
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives

Paper • 2501.04003 • Published 21 days ago • 24

OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints

Paper • 2501.03841 • Published 21 days ago • 52

Structured Agent Operations

Agents for self-driving laboratories applied to quantum computing

Paper • 2412.07978 • Published Dec 10, 2024 • 1
Towards Scientific Discovery with Generative AI: Progress, Opportunities, and Challenges

Paper • 2412.11427 • Published Dec 16, 2024 • 1
AEGIS: An Agent-based Framework for General Bug Reproduction from Issue Descriptions

Paper • 2411.18015 • Published Nov 27, 2024 • 1
LLM4SR: A Survey on Large Language Models for Scientific Research

Paper • 2501.04306 • Published 21 days ago • 33

Robotic 🤖🔧

OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints

Paper • 2501.03841 • Published 21 days ago • 52
EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation

Paper • 2501.01895 • Published 25 days ago • 50

2025 LLM Papers on Hugging Face with Japanese Memos

MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models

Paper • 2501.02955 • Published 22 days ago • 40
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published 27 days ago • 98
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding

Paper • 2501.12380 • Published 7 days ago • 78
VideoWorld: Exploring Knowledge Learning from Unlabeled Videos

Paper • 2501.09781 • Published 12 days ago • 24

OpenVLA: An Open-Source Vision-Language-Action Model

Paper • 2406.09246 • Published Jun 13, 2024 • 37
CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation

Paper • 2411.19650 • Published Nov 29, 2024
Octo: An Open-Source Generalist Robot Policy

Paper • 2405.12213 • Published May 20, 2024 • 26
Diffusion-VLA: Scaling Robot Foundation Models via Unified Diffusion and Autoregression

Paper • 2412.03293 • Published Dec 4, 2024

GRUtopia: Dream General Robots in a City at Scale

Paper • 2407.10943 • Published Jul 15, 2024 • 24
Make-An-Agent: A Generalizable Policy Network Generator with Behavior-Prompted Diffusion

Paper • 2407.10973 • Published Jul 15, 2024 • 10
Cross Anything: General Quadruped Robot Navigation through Complex Terrains

Paper • 2407.16412 • Published Jul 23, 2024 • 6
RP1M: A Large-Scale Motion Dataset for Piano Playing with Bi-Manual Dexterous Robot Hands

Paper • 2408.11048 • Published Aug 20, 2024 • 4

RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots

Paper • 2406.02523 • Published Jun 4, 2024 • 11
UniT: Unified Tactile Representation for Robot Learning

Paper • 2408.06481 • Published Aug 12, 2024 • 10
Latent Action Pretraining from Videos

Paper • 2410.11758 • Published Oct 15, 2024 • 2
Neural Fields in Robotics: A Survey

Paper • 2410.20220 • Published Oct 26, 2024 • 4

about 7 hours ago

MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training

Paper • 2311.17049 • Published Nov 28, 2023 • 1
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Paper • 2405.04434 • Published May 7, 2024 • 15
A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision

Paper • 2303.17376 • Published Mar 30, 2023
Sigmoid Loss for Language Image Pre-Training

Paper • 2303.15343 • Published Mar 27, 2023 • 6

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs