BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset Paper • 2505.09568 • Published 2 days ago • 57
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT Paper • 2505.00703 • Published 15 days ago • 41
Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption Paper • 2503.09279 • Published Mar 12 • 5
Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption Paper • 2503.09279 • Published Mar 12 • 5
Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption Paper • 2503.09279 • Published Mar 12 • 5 • 2