Towards Semantic Equivalence of Tokenization in Multimodal LLM Paper • 2406.05127 • Published Jun 7, 2024
So-Fake: Benchmarking and Explaining Social Media Image Forgery Detection Paper • 2505.18660 • Published 23 days ago • 1
PixelThink: Towards Efficient Chain-of-Pixel Reasoning Paper • 2505.23727 • Published 17 days ago • 4
Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model Paper • 2505.23606 • Published 17 days ago • 14
Conditional Panoramic Image Generation via Masked Autoregressive Modeling Paper • 2505.16862 • Published 24 days ago
MERIT: Multilingual Semantic Retrieval with Interleaved Multi-Condition Query Paper • 2506.03144 • Published 12 days ago • 3
BusterX: MLLM-Powered AI-Generated Video Forgery Detection and Explanation Paper • 2505.12620 • Published 28 days ago
CyberV: Cybernetics for Test-time Scaling in Video Understanding Paper • 2506.07971 • Published 6 days ago • 4
DiffDecompose: Layer-Wise Decomposition of Alpha-Composited Images via Diffusion Transformers Paper • 2505.21541 • Published 22 days ago • 7
On Path to Multimodal Generalist: General-Level and General-Bench Paper • 2505.04620 • Published May 7 • 80
On Path to Multimodal Generalist: General-Level and General-Bench Paper • 2505.04620 • Published May 7 • 80
DC-SAM: In-Context Segment Anything in Images and Videos via Dual Consistency Paper • 2504.12080 • Published Apr 16 • 7
Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark Paper • 2105.02440 • Published May 6, 2021
Vidi: Large Multimodal Models for Video Understanding and Editing Paper • 2504.15681 • Published Apr 22 • 15
RelationBooth: Towards Relation-Aware Customized Object Generation Paper • 2410.23280 • Published Oct 30, 2024 • 1
MIMAFace: Face Animation via Motion-Identity Modulated Appearance Feature Learning Paper • 2409.15179 • Published Sep 23, 2024
PredFormer: Transformers Are Effective Spatial-Temporal Predictive Learners Paper • 2410.04733 • Published Oct 7, 2024
Are They the Same? Exploring Visual Correspondence Shortcomings of Multimodal LLMs Paper • 2501.04670 • Published Jan 8