3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding Paper • 2412.18450 • Published 1 day ago • 28
Large Motion Video Autoencoding with Cross-modal Video VAE Paper • 2412.17805 • Published 3 days ago • 21
NILE: Internal Consistency Alignment in Large Language Models Paper • 2412.16686 • Published 5 days ago • 6
Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching Paper • 2412.17153 • Published 3 days ago • 29
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners Paper • 2412.17256 • Published 3 days ago • 35
Diving into Self-Evolving Training for Multimodal Reasoning Paper • 2412.17451 • Published 3 days ago • 34
Offline Reinforcement Learning for LLM Multi-Step Reasoning Paper • 2412.16145 • Published 6 days ago • 33
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution Paper • 2412.15213 • Published 7 days ago • 25
Affordance-Aware Object Insertion via Mask-Aware Dual Diffusion Paper • 2412.14462 • Published 7 days ago • 15
FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion Paper • 2412.09626 • Published 14 days ago • 19
NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images Paper • 2412.03517 • Published 22 days ago • 18
PaliGemma 2: A Family of Versatile VLMs for Transfer Paper • 2412.03555 • Published 22 days ago • 118
Imagine360: Immersive 360 Video Generation from Perspective Anchor Paper • 2412.03552 • Published 22 days ago • 26
Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's Reasoning Capability Paper • 2411.19943 • Published 27 days ago • 55