|
--- |
|
license: apache-2.0 |
|
--- |
|
|
|
# **VMem: Consistent Video Scene Generation with Surfel-Indexed View Memory** |
|
<p align="center"> |
|
   📖 <a href="https://v-mem.github.io/">Project Page</a>  |    🖥️ <a href="https://github.com/runjiali-rl/vmem">GitHub</a>    |   🤗 <a href="https://huggingface.co/spaces/liguang0115/vmem">Hugging Face</a>   |    📑 <a href="https://arxiv.org/abs/2506.18903v1">Paper </a>    |
|
<br> |
|
## Model Details |
|
|
|
VMem is a plug-and-play memory mechanism for **consistent autoregressive scene / novel-view video generation**. |
|
Key idea: anchor past frames to **surface elements (surfels)** in a global memory. At every step the target camera pose queries this memory for the most relevant past views, which are combined (via Plücker-line embeddings + a VAE) with noise to synthesize the next frame. The generated frame is then written back, yielding long-term geometric consistency while keeping compute low. |
|
|
|
| | | |
|
|---|---| |
|
| **Developed by** | Runjia Li, Philip Torr, Andrea Vedaldi, Tomas Jakab | |
|
| **Affiliation** | University of Oxford | |
|
| **First released** | arXiv pre-print, 2025 | |
|
| **Model type** | Generative CV (diffusion / autoregressive latent generator with external surfel memory) | |
|
| **Modality** | Images → video (RGB); camera pose conditioning | |
|
| **License** | Apache-2.0 | |
|
|
|
|
|
|
|
--- |
|
|
|
### Direct Use |
|
- **Novel-view video generation** from a single image or a short camera sweep. |
|
- **Scene roaming** in AR/VR: move a virtual camera through a captured room while preserving layout and textures. |
|
- **Video editing / completion** where long-term geometry must stay stable. |