VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models Paper • 2504.13122 • Published 3 days ago • 19
VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models Paper • 2504.13122 • Published 3 days ago • 19
RelationBooth: Towards Relation-Aware Customized Object Generation Paper • 2410.23280 • Published Oct 30, 2024 • 1
RelationBooth: Towards Relation-Aware Customized Object Generation Paper • 2410.23280 • Published Oct 30, 2024 • 1
MIMAFace: Face Animation via Motion-Identity Modulated Appearance Feature Learning Paper • 2409.15179 • Published Sep 23, 2024
PredFormer: Transformers Are Effective Spatial-Temporal Predictive Learners Paper • 2410.04733 • Published Oct 7, 2024
Are They the Same? Exploring Visual Correspondence Shortcomings of Multimodal LLMs Paper • 2501.04670 • Published Jan 8
Point Cloud Mamba: Point Cloud Learning via State Space Model Paper • 2403.00762 • Published Mar 1, 2024
An Open and Comprehensive Pipeline for Unified Object Grounding and Detection Paper • 2401.02361 • Published Jan 4, 2024
MambaAD: Exploring State Space Models for Multi-class Unsupervised Anomaly Detection Paper • 2404.06564 • Published Apr 9, 2024
SIDA: Social Media Image Deepfake Detection, Localization and Explanation with Large Multimodal Model Paper • 2412.04292 • Published Dec 5, 2024
Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation Paper • 2410.10676 • Published Oct 14, 2024