MuLan: A Joint Embedding of Music Audio and Natural Language Paper • 2208.12415 • Published Aug 26, 2022
CoCa: Contrastive Captioners are Image-Text Foundation Models Paper • 2205.01917 • Published May 4, 2022 • 3