prisar
's Collections
video-analysis
updated
VideoITG: Multimodal Video Understanding with Instructed Temporal
Grounding
Paper
•
2507.13353
•
Published
•
1
Kwai Keye-VL Technical Report
Paper
•
2507.01949
•
Published
•
128
UGC-VideoCaptioner: An Omni UGC Video Detail Caption Model and New
Benchmarks
Paper
•
2507.11336
•
Published
•
4
Attention is all you need for Videos: Self-attention based Video
Summarization using Universal Transformers
Paper
•
1906.02792
•
Published
Rethinking the Evaluation of Video Summaries
Paper
•
1903.11328
•
Published
Self-supervised pre-training and contrastive representation learning for
multiple-choice video QA
Paper
•
2009.08043
•
Published
Video Representation Learning with Visual Tempo Consistency
Paper
•
2006.15489
•
Published
Video Representation Learning by Recognizing Temporal Transformations
Paper
•
2007.10730
•
Published
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
Paper
•
2112.09583
•
Published
Paper
•
2106.13230
•
Published
HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training
Paper
•
2212.14546
•
Published
Video-Text Retrieval by Supervised Sparse Multi-Grained Learning
Paper
•
2302.09473
•
Published
•
1
Scanning Only Once: An End-to-end Framework for Fast Temporal Grounding
in Long Videos
Paper
•
2303.08345
•
Published
Efficient Semantic Segmentation by Altering Resolutions for Compressed
Videos
Paper
•
2303.07224
•
Published
VideoXum: Cross-modal Visual and Textural Summarization of Videos
Paper
•
2303.12060
•
Published
Open-vocabulary Video Question Answering: A New Benchmark for Evaluating
the Generalizability of Video Question Answering Models
Paper
•
2308.09363
•
Published
Multi-event Video-Text Retrieval
Paper
•
2308.11551
•
Published
Video ReCap: Recursive Captioning of Hour-Long Videos
Paper
•
2402.13250
•
Published
•
27
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of
Multi-modal LLMs in Video Analysis
Paper
•
2405.21075
•
Published
•
26
Video Editing for Video Retrieval
Paper
•
2402.02335
•
Published
Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal
Models for Video Question Answering
Paper
•
2401.10711
•
Published
V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt
Instruction Tuning
Paper
•
2404.12353
•
Published
Video Captioning with Aggregated Features Based on Dual Graphs and Gated
Fusion
Paper
•
2308.06685
•
Published
•
1
Moving Object Based Collision-Free Video Synopsis
Paper
•
2401.02419
•
Published
•
1
HawkEye: Training Video-Text LLMs for Grounding Text in Videos
Paper
•
2403.10228
•
Published
•
1
Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating
Video-based Large Language Models
Paper
•
2311.16103
•
Published
•
1
Multi-Modal Video Topic Segmentation with Dual-Contrastive Domain
Adaptation
Paper
•
2312.00220
•
Published
•
1
Conditional Modeling Based Automatic Video Summarization
Paper
•
2311.12159
•
Published
•
1
Tunnel Try-on: Excavating Spatial-temporal Tunnels for High-quality
Virtual Try-on in Videos
Paper
•
2404.17571
•
Published
•
1
vid-TLDR: Training Free Token merging for Light-weight Video Transformer
Paper
•
2403.13347
•
Published
•
1
Boost Video Frame Interpolation via Motion Adaptation
Paper
•
2306.13933
•
Published
•
3
VcLLM: Video Codecs are Secretly Tensor Codecs
Paper
•
2407.00467
•
Published
•
2
Video Understanding with Large Language Models: A Survey
Paper
•
2312.17432
•
Published
•
3
Video Mamba Suite: State Space Model as a Versatile Alternative for
Video Understanding
Paper
•
2403.09626
•
Published
•
16
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and
Language Models
Paper
•
2306.05424
•
Published
•
7
PG-Video-LLaVA: Pixel Grounding Large Video-Language Models
Paper
•
2311.13435
•
Published
•
19
Towards Retrieval Augmented Generation over Large Video Libraries
Paper
•
2406.14938
•
Published
•
21