OpenOmni: Large Language Models Pivot Zero-shot Omnimodal Alignment across Language with Real-time Self-Aware Emotional Speech Synthesis Paper • 2501.04561 • Published Jan 8 • 16
MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct Paper • 2409.05840 • Published Sep 9, 2024 • 49
Text-Video Retrieval with Global-Local Semantic Consistent Learning Paper • 2405.12710 • Published May 21, 2024
Channel Importance Matters in Few-Shot Image Classification Paper • 2206.08126 • Published Jun 16, 2022
Rectifying the Shortcut Learning of Background for Few-Shot Learning Paper • 2107.07746 • Published Jul 16, 2021