liguang0115
/

vmem

Model card Files Files and versions Community

vmem / README.md

liguang0115

Update README.md

ac59210 verified 2 days ago

preview code

raw

history blame contribute delete

1.69 kB

	---
	license: apache-2.0
	---

	# VMem: Consistent Video Scene Generation with Surfel-Indexed View Memory
	<p align="center">
	&nbsp&nbsp 📖 <a href="https://v-mem.github.io/">Project Page</a>&nbsp&nbsp｜ &nbsp&nbsp 🖥️ <a href="https://github.com/runjiali-rl/vmem">GitHub</a> &nbsp&nbsp \| &nbsp&nbsp🤗 <a href="https://huggingface.co/spaces/liguang0115/vmem">Hugging Face</a>&nbsp&nbsp \| &nbsp&nbsp 📑 <a href="https://arxiv.org/abs/2506.18903v1">Paper </a> &nbsp&nbsp
	<br>
	## Model Details

	VMem is a plug-and-play memory mechanism for consistent autoregressive scene / novel-view video generation.
	Key idea: anchor past frames to surface elements (surfels) in a global memory. At every step the target camera pose queries this memory for the most relevant past views, which are combined (via Plücker-line embeddings + a VAE) with noise to synthesize the next frame. The generated frame is then written back, yielding long-term geometric consistency while keeping compute low.

	\| \| \|
	\|---\|---\|
	\| Developed by \| Runjia Li, Philip Torr, Andrea Vedaldi, Tomas Jakab \|
	\| Affiliation \| University of Oxford \|
	\| First released \| arXiv pre-print, 2025 \|
	\| Model type \| Generative CV (diffusion / autoregressive latent generator with external surfel memory) \|
	\| Modality \| Images → video (RGB); camera pose conditioning \|
	\| License \| Apache-2.0 \|



	---

	### Direct Use
	- Novel-view video generation from a single image or a short camera sweep.
	- Scene roaming in AR/VR: move a virtual camera through a captured room while preserving layout and textures.
	- Video editing / completion where long-term geometry must stay stable.

	---
	license: apache-2.0
	---

	# VMem: Consistent Video Scene Generation with Surfel-Indexed View Memory
	<p align="center">
	&nbsp&nbsp 📖 <a href="https://v-mem.github.io/">Project Page</a>&nbsp&nbsp｜ &nbsp&nbsp 🖥️ <a href="https://github.com/runjiali-rl/vmem">GitHub</a> &nbsp&nbsp \| &nbsp&nbsp🤗 <a href="https://huggingface.co/spaces/liguang0115/vmem">Hugging Face</a>&nbsp&nbsp \| &nbsp&nbsp 📑 <a href="https://arxiv.org/abs/2506.18903v1">Paper </a> &nbsp&nbsp
	<br>
	## Model Details

	VMem is a plug-and-play memory mechanism for consistent autoregressive scene / novel-view video generation.
	Key idea: anchor past frames to surface elements (surfels) in a global memory. At every step the target camera pose queries this memory for the most relevant past views, which are combined (via Plücker-line embeddings + a VAE) with noise to synthesize the next frame. The generated frame is then written back, yielding long-term geometric consistency while keeping compute low.

	\| \| \|
	\|---\|---\|
	\| Developed by \| Runjia Li, Philip Torr, Andrea Vedaldi, Tomas Jakab \|
	\| Affiliation \| University of Oxford \|
	\| First released \| arXiv pre-print, 2025 \|
	\| Model type \| Generative CV (diffusion / autoregressive latent generator with external surfel memory) \|
	\| Modality \| Images → video (RGB); camera pose conditioning \|
	\| License \| Apache-2.0 \|



	---

	### Direct Use
	- Novel-view video generation from a single image or a short camera sweep.
	- Scene roaming in AR/VR: move a virtual camera through a captured room while preserving layout and textures.
	- Video editing / completion where long-term geometry must stay stable.