ResearchTown: Simulator of Human Research Community Paper • 2412.17767 • Published 2 days ago • 10 • 2
Large Motion Video Autoencoding with Cross-modal Video VAE Paper • 2412.17805 • Published 2 days ago • 20 • 3
Deliberation in Latent Space via Differentiable Cache Augmentation Paper • 2412.17747 • Published 2 days ago • 23 • 4
ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing Paper • 2412.14711 • Published 7 days ago • 8 • 2
DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation Paper • 2412.18597 • Published 1 day ago • 10 • 2
Ensembling Large Language Models with Process Reward-Guided Tree Search for Better Complex Reasoning Paper • 2412.15797 • Published 6 days ago • 4 • 2
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization Paper • 2412.17739 • Published 2 days ago • 21 • 3
3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding Paper • 2412.18450 • Published 1 day ago • 25 • 2
SKETCH: Structured Knowledge Enhanced Text Comprehension for Holistic Retrieval Paper • 2412.15443 • Published 6 days ago • 6 • 2
In Case You Missed It: ARC 'Challenge' Is Not That Challenging Paper • 2412.17758 • Published 2 days ago • 9 • 2
PartGen: Part-level 3D Generation and Reconstruction with Multi-View Diffusion Models Paper • 2412.18608 • Published 1 day ago • 5 • 2
MotiF: Making Text Count in Image Animation with Motion Focal Loss Paper • 2412.16153 • Published 5 days ago • 3 • 2
Bridging the Data Provenance Gap Across Text, Speech and Video Paper • 2412.17847 • Published 7 days ago • 3 • 2