2 8 2

Jie Shao

hehesang

http://www.lamda.nju.edu.cn/shaoj/

hehesangsj

AI & ML interests

computer vision, ai for science

Recent Activity

upvoted a paper about 1 month ago

Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities

authored a paper 2 months ago

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

upvoted a paper 2 months ago

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

View all activity

Organizations

hehesang's activity

upvoted a paper about 1 month ago

Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities

Paper • 2505.02567 • Published May 5 • 75

authored a paper 2 months ago

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published Apr 14 • 274

upvoted a paper 2 months ago

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published Apr 14 • 274

upvoted 3 papers 3 months ago

Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing

Paper • 2504.02826 • Published Apr 3 • 69

Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy

Paper • 2503.19757 • Published Mar 25 • 51

VisualPRM: An Effective Process Reward Model for Multimodal Reasoning

Paper • 2503.10291 • Published Mar 13 • 37

New activity in google/siglip2-giant-opt-patch16-384 4 months ago

AutoModel.from_pretrained error in loading state_dict

#3 opened 4 months ago by

Srymaker

upvoted a paper 6 months ago

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding

Paper • 2412.09604 • Published Dec 12, 2024 • 39

authored a paper 6 months ago

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding

Paper • 2412.09604 • Published Dec 12, 2024 • 39

liked a model 6 months ago

OpenGVLab/InternVL2_5-78B

Image-Text-to-Text • Updated Mar 25 • 1.56k • 191

commented a paper 7 months ago

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

Paper • 2411.10442 • Published Nov 15, 2024 • 80 •

upvoted a paper 7 months ago

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

Paper • 2411.10442 • Published Nov 15, 2024 • 80

upvoted a paper about 1 year ago

Needle In A Multimodal Haystack

Paper • 2406.07230 • Published Jun 11, 2024 • 55

liked a model about 1 year ago

OpenGVLab/InternVL-Chat-ViT-6B-Vicuna-7B

Visual Question Answering • Updated Aug 24, 2024 • 38 • 8