UniTok: A Unified Tokenizer for Visual Generation and Understanding Paper • 2502.20321 • Published Feb 27 • 30
LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning Paper • 2503.04812 • Published Mar 4 • 14