FusionAudio-1.2M: Towards Fine-grained Audio Captioning with Multimodal Contextual Fusion Paper • 2506.01111 • Published Jun 1 • 30