metadata

tags:
  - Image-to-3D
  - Image-to-4D
  - GenXD
language:
  - en
  - zh
base_model:
  - stabilityai/stable-video-diffusion-img2vid-xt
pipeline_tag: image-to-3d
license: apache-2.0
datasets:
  - Yuyang-z/CamVid-30K

GenXD Model Card

Model Details

teaser_page1

Model Description

GenXD leverages mask latent conditioned diffusion model to generate 3D and 4D samples with both camera and image conditions. In addition, multiview-temporal modules together with alpha-fusing are proposed to effectively disentangle and fuse multiview and temporal information.

Developed by: NUS, Microsoft
Model type: image-to-3D diffusion model, image-to-video diffusion model, image-to-4D diffusion model
License: Apache-2.0

Model Sources

Project Page: https://gen-x-d.github.io
Repository: https://github.com/HeliosZhao/GenXD
Paper: https://arxiv.org/abs/2411.02319
Data: https://huggingface.co/datasets/Yuyang-z/CamVid-30K

Uses

Direct Use

The model is intended for research purposes only. Possible research areas and tasks include

Generation of artworks and use in design and other artistic processes.
Applications in educational or creative tools.
Research on generative models.
Safe deployment of models which have the potential to generate harmful content.
Probing and understanding the limitations and biases of generative models.

Excluded uses are described below.

Out-of-Scope Use

The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model.

Limitations and Bias

Limitations

The model does not achieve perfect photorealism.
The model does not achieve perfect 3D consistency.

Bias

While the capabilities of generation model is impressive, it can also reinforce or exacerbate social biases.