arxiv:2511.19365
ZehongMa
zehongma
AI & ML interests
MLLMs, Image/Video Generation, Multi-modal Representation Learning
Recent Activity
upvoted an article about 11 hours ago
PRX Part 3 — Training a Text-to-Image Model in 24h! upvoted a paper 22 days ago
Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision EncodersOrganizations
None yet