Llama 3.3 Collection This collection hosts the transformers and original repos of the Llama 3.3 • 1 item • Updated Dec 6, 2024 • 121
Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step Paper • 2501.13926 • Published 6 days ago • 28
Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos Paper • 2501.13826 • Published 6 days ago • 21
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding Paper • 2501.13106 • Published 7 days ago • 73
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published 7 days ago • 260
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding Paper • 2501.12380 • Published 8 days ago • 79
SynthLight: Portrait Relighting with Diffusion Model by Learning to Re-render Synthetic Faces Paper • 2501.09756 • Published 13 days ago • 19
Do generative video models learn physical principles from watching videos? Paper • 2501.09038 • Published 15 days ago • 31
Learnings from Scaling Visual Tokenizers for Reconstruction and Generation Paper • 2501.09755 • Published 13 days ago • 33
MMDocIR: Benchmarking Multi-Modal Retrieval for Long Documents Paper • 2501.08828 • Published 14 days ago • 30
MiniMax-01: Scaling Foundation Models with Lightning Attention Paper • 2501.08313 • Published 15 days ago • 270
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs Paper • 2501.06186 • Published 19 days ago • 59
VideoRAG: Retrieval-Augmented Generation over Video Corpus Paper • 2501.05874 • Published 19 days ago • 66
The FACTS Grounding Leaderboard: Benchmarking LLMs' Ability to Ground Responses to Long-Form Input Paper • 2501.03200 • Published 23 days ago • 1
InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection Paper • 2501.04575 • Published 21 days ago • 23
Agent Laboratory: Using LLM Agents as Research Assistants Paper • 2501.04227 • Published 22 days ago • 84
Cosmos World Foundation Model Platform for Physical AI Paper • 2501.03575 • Published 23 days ago • 67
STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution Paper • 2501.02976 • Published 23 days ago • 52