Large Language Models are Temporal and Causal Reasoners for Video Question Answering Paper • 2310.15747 • Published Oct 24, 2023 • 1
HOTR: End-to-End Human-Object Interaction Detection with Transformers Paper • 2104.13682 • Published Apr 28, 2021
DDMI: Domain-Agnostic Latent Diffusion Models for Synthesizing High-Quality Implicit Neural Representations Paper • 2401.12517 • Published Jan 23, 2024 • 1
Open-vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models Paper • 2308.09363 • Published Aug 18, 2023
Distribution-Aware Prompt Tuning for Vision-Language Models Paper • 2309.03406 • Published Sep 6, 2023 • 1
Semantic-Aware Implicit Template Learning via Part Deformation Consistency Paper • 2308.11916 • Published Aug 23, 2023
Read-only Prompt Optimization for Vision-Language Few-shot Learning Paper • 2308.14960 • Published Aug 29, 2023 • 3
Consistency Learning via Decoding Path Augmentation for Transformers in Human Object Interaction Detection Paper • 2204.04836 • Published Apr 11, 2022
Multi-criteria Token Fusion with One-step-ahead Attention for Efficient Vision Transformers Paper • 2403.10030 • Published Mar 15, 2024 • 1
vid-TLDR: Training Free Token merging for Light-weight Video Transformer Paper • 2403.13347 • Published Mar 20, 2024 • 1
Robust Camera Pose Refinement for Multi-Resolution Hash Encoding Paper • 2302.01571 • Published Feb 3, 2023
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding Paper • 2410.17434 • Published Oct 22, 2024 • 30
Diffusion Prior-Based Amortized Variational Inference for Noisy Inverse Problems Paper • 2407.16125 • Published Jul 23, 2024
LLaMo: Large Language Model-based Molecular Graph Assistant Paper • 2411.00871 • Published Oct 31, 2024 • 23
EfficientViM: Efficient Vision Mamba with Hidden State Mixer based State Space Duality Paper • 2411.15241 • Published Nov 22, 2024 • 7
ST-VLM: Kinematic Instruction Tuning for Spatio-Temporal Reasoning in Vision-Language Models Paper • 2503.19355 • Published Mar 25 • 1
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO Paper • 2506.07464 • Published Jun 9 • 12