Can LLMs Generate Human-Like Wayfinding Instructions? Towards Platform-Agnostic Embodied Instruction Synthesis Paper • 2403.11487 • Published Mar 18, 2024 • 1
Style Customization of Text-to-Vector Generation with Image Diffusion Priors Paper • 2505.10558 • Published about 1 month ago • 15
Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models Paper • 2505.04921 • Published May 8 • 176
Describe Anything Collection Multimodal Large Language Models for Detailed Localized Image and Video Captioning • 7 items • Updated 3 days ago • 51
view article Article Cohere on Hugging Face Inference Providers 🔥 By burtenshaw and 6 others • Apr 16 • 126
DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning Paper • 2504.11456 • Published Apr 15 • 13
NormalCrafter: Learning Temporally Consistent Normals from Video Diffusion Priors Paper • 2504.11427 • Published Apr 15 • 19
MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft Paper • 2504.08388 • Published Apr 11 • 40
AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories Paper • 2504.08942 • Published Apr 11 • 27
OpenCodeReasoning: Advancing Data Distillation for Competitive Coding Paper • 2504.01943 • Published Apr 2 • 15
view article Article Hugging Face to sell open-source robots thanks to Pollen Robotics acquisition 🤖 By thomwolf and 2 others • Apr 14 • 46
VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning Paper • 2504.07960 • Published Apr 10 • 48
view article Article Hugging Face and Cloudflare Partner to Make Real-Time Speech and Video Seamless with FastRTC By freddyaboulton • Apr 9 • 26