Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning
Yang Liu
yliu-cs
AI & ML interests
Multi-Modal Learning
Recent Activity
upvoted
a
paper
3 days ago
Paper2Poster: Towards Multimodal Poster Automation from Scientific
Papers
upvoted
a
paper
4 days ago
HoliTom: Holistic Token Merging for Fast Video Large Language Models
authored
a paper
11 days ago
PiTe: Pixel-Temporal Alignment for Large Video-Language Model
Organizations
None yet