Salesforce/xgen-mm-phi3-mini-instruct-interleave-r-v1.5 Image-Text-to-Text • Updated 1 day ago • 7.85k • 47
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations Paper • 2408.12590 • Published Aug 22, 2024 • 36
Salesforce/xgen-mm-phi3-mini-instruct-interleave-r-v1.5 Image-Text-to-Text • Updated 1 day ago • 7.85k • 47
Salesforce/xgen-mm-phi3-mini-instruct-singleimg-r-v1.5 Image-Text-to-Text • Updated 1 day ago • 104 • 15
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models Paper • 2408.08872 • Published Aug 16, 2024 • 98
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models Paper • 2408.08872 • Published Aug 16, 2024 • 98
🍃 MINT-1T Collection Data for "MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens" • 13 items • Updated Jul 24, 2024 • 58
Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models Paper • 2209.07511 • Published Sep 15, 2022