Yuhao Dong's picture

Yuhao Dong

THUdyh

·

AI & ML interests

None yet

Recent Activity

updated a model 1 day ago

THUdyh/Ola-7b

new activity 1 day ago

THUdyh/Ola_speech_encoders:Add audio-feature-extraction pipeline tag, library name, and project page URL

new activity 1 day ago

THUdyh/Ola-Image:Upload config.json

View all activity

Organizations

authored a paper 2 days ago

Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning

Paper • 2506.13654 • Published 8 days ago • 42

authored a paper 4 months ago

EgoLife: Towards Egocentric Life Assistant

Paper • 2503.03803 • Published Mar 5 • 45

authored a paper 5 months ago

Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment

Paper • 2502.04328 • Published Feb 6 • 30

authored a paper 6 months ago

Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives

Paper • 2501.04003 • Published Jan 7 • 28

authored a paper 7 months ago

Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Paper • 2411.14432 • Published Nov 21, 2024 • 26

authored 2 papers 9 months ago

3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion

Paper • 2409.12957 • Published Sep 19, 2024 • 22

Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution

Paper • 2409.12961 • Published Sep 19, 2024 • 26

authored 2 papers 11 months ago

Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model

Paper • 2408.00754 • Published Aug 1, 2024 • 25

Efficient Inference of Vision Instruction-Following Models with Elastic Cache

Paper • 2407.18121 • Published Jul 25, 2024 • 17

authored a paper over 1 year ago

Octopus: Embodied Vision-Language Programmer from Environmental Feedback

Paper • 2310.08588 • Published Oct 12, 2023 • 38