--- license: apache-2.0 --- # **VMem: Consistent Video Scene Generation with Surfel-Indexed View Memory**

📖 Project Page ｜ 🖥️ GitHub | 🤗 Hugging Face | 📑 Paper
## Model Details VMem is a plug-and-play memory mechanism for **consistent autoregressive scene / novel-view video generation**. Key idea: anchor past frames to **surface elements (surfels)** in a global memory. At every step the target camera pose queries this memory for the most relevant past views, which are combined (via Plücker-line embeddings + a VAE) with noise to synthesize the next frame. The generated frame is then written back, yielding long-term geometric consistency while keeping compute low. | | | |---|---| | **Developed by** | Runjia Li, Philip Torr, Andrea Vedaldi, Tomas Jakab | | **Affiliation** | University of Oxford | | **First released** | arXiv pre-print, 2025 | | **Model type** | Generative CV (diffusion / autoregressive latent generator with external surfel memory) | | **Modality** | Images → video (RGB); camera pose conditioning | | **License** | Apache-2.0 | --- ### Direct Use - **Novel-view video generation** from a single image or a short camera sweep. - **Scene roaming** in AR/VR: move a virtual camera through a captured room while preserving layout and textures. - **Video editing / completion** where long-term geometry must stay stable.