4 24 4

Zhengyuan Yang PRO

zyang39

https://zhengyuan.info/

AI & ML interests

None yet

Recent Activity

upvoted a paper 29 days ago

STITCH: Simultaneous Thinking and Talking with Chunked Reasoning for Spoken Language Models

updated a dataset about 2 months ago

zyang39/CoSyn-400K-merged-caption

updated a dataset about 2 months ago

zyang39/CoSyn-400K-merged-caption

View all activity

Organizations

authored a paper 2 months ago

ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs

Paper • 2506.10128 • Published Jun 11 • 23

authored a paper 3 months ago

OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning

Paper • 2505.08617 • Published May 13 • 42

authored 2 papers 4 months ago

SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement

Paper • 2504.07934 • Published Apr 10 • 20

V-MAGE: A Game Evaluation Framework for Assessing Visual-Centric Capabilities in Multimodal Large Language Models

Paper • 2504.06148 • Published Apr 8 • 13

authored a paper 5 months ago

Beyond Words: Advancing Long-Text Image Generation via Multimodal Autoregressive Models

Paper • 2503.20198 • Published Mar 26 • 4

authored a paper 7 months ago

ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding

Paper • 2501.05452 • Published Jan 9 • 15

authored a paper 8 months ago

OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation

Paper • 2412.09585 • Published Dec 12, 2024 • 11

authored 2 papers 10 months ago

GenXD: Generating Any 3D and 4D Scenes

Paper • 2411.02319 • Published Nov 4, 2024 • 20

SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation

Paper • 2410.23277 • Published Oct 30, 2024 • 9

authored 3 papers about 1 year ago

MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities

Paper • 2408.00765 • Published Aug 1, 2024 • 14

VideoGUI: A Benchmark for GUI Automation from Instructional Videos

Paper • 2406.10227 • Published Jun 14, 2024 • 9

MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

Paper • 2406.08407 • Published Jun 12, 2024 • 29

authored 4 papers over 1 year ago

List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs

Paper • 2404.16375 • Published Apr 25, 2024 • 18

authored 4 papers almost 2 years ago

GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation

Paper • 2311.07562 • Published Nov 13, 2023 • 15

MM-VID: Advancing Video Understanding with GPT-4V(ision)

Paper • 2310.19773 • Published Oct 30, 2023 • 20

DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual Design

Paper • 2310.15144 • Published Oct 23, 2023 • 14

Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation

Paper • 2310.08541 • Published Oct 12, 2023 • 18

Zhengyuan Yang PRO

AI & ML interests

Recent Activity

Organizations

zyang39's activity