Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams Paper • 2406.08085 • Published Jun 12, 2024 • 14
Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model Paper • 2408.00754 • Published Aug 1, 2024 • 23
MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model Paper • 2404.19759 • Published Apr 30, 2024 • 27
KV-Edit: Training-Free Image Editing for Precise Background Preservation Paper • 2502.17363 • Published 2 days ago • 26
VoCo-LLaMA: Towards Vision Compression with Large Language Models Paper • 2406.12275 • Published Jun 18, 2024 • 31
LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer Paper • 2502.01105 • Published 24 days ago • 19