Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks Paper • 2503.21696 • Published about 1 month ago • 22
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper • 2501.00958 • Published Jan 1 • 107
ChatGen: Automatic Text-to-Image Generation From FreeStyle Chatting Paper • 2411.17176 • Published Nov 26, 2024 • 24
IFAdapter: Instance Feature Control for Grounded Text-to-Image Generation Paper • 2409.08240 • Published Sep 12, 2024 • 23
Course-Correction: Safety Alignment Using Synthetic Preferences Paper • 2407.16637 • Published Jul 23, 2024 • 27
Data-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow Paper • 2306.07209 • Published Jun 12, 2023 • 2
Self-Contrast: Better Reflection Through Inconsistent Solving Perspectives Paper • 2401.02009 • Published Jan 4, 2024 • 1
Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model Paper • 2407.07053 • Published Jul 9, 2024 • 47
Entangled View-Epipolar Information Aggregation for Generalizable Neural Radiance Fields Paper • 2311.11845 • Published Nov 20, 2023 • 1
PLA4D: Pixel-Level Alignments for Text-to-4D Gaussian Splatting Paper • 2405.19957 • Published May 30, 2024 • 10
DragAnything: Motion Control for Anything using Entity Representation Paper • 2403.07420 • Published Mar 12, 2024 • 15
MotionDirector: Motion Customization of Text-to-Video Diffusion Models Paper • 2310.08465 • Published Oct 12, 2023 • 16