Zedong Wang
ZedongWangAI
AI & ML interests
Computer Vision, Multi-task Learning, Multi-modal Learning.
Recent Activity
upvoted
a
paper
about 4 hours ago
Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery
Simulation
upvoted
a
paper
about 16 hours ago
Token-Shuffle: Towards High-Resolution Image Generation with
Autoregressive Models
upvoted
a
paper
about 16 hours ago
Breaking the Modality Barrier: Universal Embedding Learning with
Multimodal LLMs
Organizations
Collections
2
-
Unveiling the Backbone-Optimizer Coupling Bias in Visual Representation Learning
Paper • 2410.06373 • Published • 34 -
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization
Paper • 2504.00999 • Published • 83 -
What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models
Paper • 2503.24235 • Published • 53 -
MoCha: Towards Movie-Grade Talking Character Synthesis
Paper • 2503.23307 • Published • 130
models
0
None public yet
datasets
0
None public yet