28 61 18

Yuhao Dong

THUdyh

AI & ML interests

None yet

Recent Activity

updated a model 1 day ago

THUdyh/Ola-7b

new activity 1 day ago

THUdyh/Ola_speech_encoders:Add audio-feature-extraction pipeline tag, library name, and project page URL

new activity 1 day ago

THUdyh/Ola-Image:Upload config.json

View all activity

Organizations

upvoted a paper 1 day ago

Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition

Paper • 2506.17201 • Published 4 days ago • 32

upvoted a paper 8 days ago

Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning

Paper • 2506.13654 • Published 8 days ago • 42

upvoted 2 papers 19 days ago

Video World Models with Long-term Spatial Memory

Paper • 2506.05284 • Published 19 days ago • 52

SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs

Paper • 2506.05344 • Published 19 days ago • 16

upvoted a paper about 1 month ago

Seed1.5-VL Technical Report

Paper • 2505.07062 • Published May 11 • 145

upvoted 4 papers about 2 months ago

StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant

Paper • 2505.05467 • Published May 8 • 13

Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Paper • 2505.03335 • Published May 6 • 173

RM-R1: Reward Modeling as Reasoning

Paper • 2505.02387 • Published May 5 • 77

R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning

Paper • 2505.02835 • Published May 5 • 26

upvoted 4 papers 2 months ago

Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models

Paper • 2504.15271 • Published Apr 21 • 65

VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

Paper • 2504.08837 • Published Apr 10 • 43

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published Apr 14 • 274

Kimi-VL Technical Report

Paper • 2504.07491 • Published Apr 10 • 129

upvoted 2 papers 3 months ago

Synthetic Video Enhances Physical Fidelity in Video Synthesis

Paper • 2503.20822 • Published Mar 26 • 16

VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness

Paper • 2503.21755 • Published Mar 27 • 34

upvoted a paper 4 months ago

EgoLife: Towards Egocentric Life Assistant

Paper • 2503.03803 • Published Mar 5 • 45

upvoted a collection 4 months ago

EgoLife

Collection

CVPR 2025 - EgoLife: Towards Egocentric Life Assistant. Homepage: https://egolife-ai.github.io/ • 10 items • Updated Mar 7 • 19

upvoted a paper 4 months ago

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published Feb 20 • 145

upvoted 2 papers 5 months ago

Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment

Paper • 2502.04328 • Published Feb 6 • 30

s1: Simple test-time scaling

Paper • 2501.19393 • Published Jan 31 • 126

Yuhao Dong

AI & ML interests

Recent Activity

Organizations

THUdyh's activity