Ensembling Large Language Models with Process Reward-Guided Tree Search for Better Complex Reasoning Paper • 2412.15797 • Published 6 days ago • 4
MotiF: Making Text Count in Image Animation with Motion Focal Loss Paper • 2412.16153 • Published 5 days ago • 3
Bridging the Data Provenance Gap Across Text, Speech and Video Paper • 2412.17847 • Published 7 days ago • 3
3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding Paper • 2412.18450 • Published 1 day ago • 26
PartGen: Part-level 3D Generation and Reconstruction with Multi-View Diffusion Models Paper • 2412.18608 • Published 1 day ago • 6
DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation Paper • 2412.18597 • Published 1 day ago • 12
SKETCH: Structured Knowledge Enhanced Text Comprehension for Holistic Retrieval Paper • 2412.15443 • Published 6 days ago • 6
ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing Paper • 2412.14711 • Published 7 days ago • 8
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization Paper • 2412.17739 • Published 2 days ago • 23
IDOL: Instant Photorealistic 3D Human Creation from a Single Image Paper • 2412.14963 • Published 7 days ago • 5
Outcome-Refining Process Supervision for Code Generation Paper • 2412.15118 • Published 6 days ago • 14
PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital World Paper • 2412.17589 • Published 3 days ago • 8
OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning Paper • 2412.16849 • Published 4 days ago • 5
Friends-MMC: A Dataset for Multi-modal Multi-party Conversation Understanding Paper • 2412.17295 • Published 3 days ago • 8
DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought Paper • 2412.17498 • Published 3 days ago • 15