DI-PCG: Diffusion-based Efficient Inverse Procedural Content Generation for High-quality 3D Asset Creation Paper • 2412.15200 • Published 6 days ago • 9
UIP2P: Unsupervised Instruction-based Image Editing via Cycle Edit Consistency Paper • 2412.15216 • Published 6 days ago • 5
LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis Paper • 2412.15214 • Published 6 days ago • 14
No More Adam: Learning Rate Scaling at Initialization is All You Need Paper • 2412.11768 • Published 10 days ago • 41
Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation Paper • 2412.14015 • Published 8 days ago • 12
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks Paper • 2412.14161 • Published 7 days ago • 43
Emergence of Abstractions: Concept Encoding and Decoding Mechanism for In-Context Learning in Transformers Paper • 2412.12276 • Published 9 days ago • 15
Multi-Dimensional Insights: Benchmarking Real-World Personalization in Large Multimodal Models Paper • 2412.12606 • Published 9 days ago • 41
LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity Paper • 2412.09856 • Published 13 days ago • 9
ObjectMate: A Recurrence Prior for Object Insertion and Subject-Driven Generation Paper • 2412.08645 • Published 14 days ago • 11
SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding Paper • 2412.09604 • Published 13 days ago • 35
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published 12 days ago • 131
GaussianProperty: Integrating Physical Properties to 3D Gaussians with LMMs Paper • 2412.11258 • Published 10 days ago • 13
StrandHead: Text to Strand-Disentangled 3D Head Avatars Using Hair Geometric Priors Paper • 2412.11586 • Published 10 days ago • 11
FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion Paper • 2412.09626 • Published 13 days ago • 19
Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders Paper • 2412.09586 • Published 13 days ago • 5
Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity Visual Descriptions Paper • 2412.08737 • Published 14 days ago • 51
Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation Paper • 2412.06016 • Published 17 days ago • 20