nielsr HF Staff commited on
Commit
9bd80c6
·
verified ·
1 Parent(s): c0ee84d

Add pipeline tag image-to-3d

Browse files

This PR adds the `pipeline_tag: image-to-3d` metadata to improve searchability of the model on the Hugging Face Hub.

Files changed (1) hide show
  1. README.md +12 -4
README.md CHANGED
@@ -1,8 +1,8 @@
1
  ---
2
  license: mit
 
3
  ---
4
 
5
-
6
  # Direct3D‑S2: Gigascale 3D Generation Made Easy with Spatial Sparse Attention
7
 
8
  <div align="center">
@@ -27,7 +27,7 @@ license: mit
27
 
28
  ## 📝 Abstract
29
 
30
- Generating high-resolution 3D shapes using volumetric representations such as Signed Distance Functions (SDFs) presents substantial computational and memory challenges. We introduce <strong class="has-text-weight-bold">Direct3D‑S2</strong>, a scalable 3D generation framework based on sparse volumes that achieves superior output quality with dramatically reduced training costs. Our key innovation is the <strong class="has-text-weight-bold">Spatial Sparse Attention (SSA)</strong> mechanism, which greatly enhances the efficiency of Diffusion Transformer (DiT) computations on sparse volumetric data. SSA allows the model to effectively process large token sets within sparse volumes, substantially reducing computational overhead and achieving a <em>3.9&times;</em> speedup in the forward pass and a <em>9.6&times;</em> speedup in the backward pass. Our framework also includes a variational autoencoder (VAE) that maintains a consistent sparse volumetric format across input, latent, and output stages. Compared to previous methods with heterogeneous representations in 3D VAE, this unified design significantly improves training efficiency and stability. Our model is trained on public available datasets, and experiments demonstrate that <strong class="has-text-weight-bold">Direct3D‑S2</strong> not only surpasses state-of-the-art methods in generation quality and efficiency, but also enables <strong class="has-text-weight-bold">training at 1024<sup>3</sup> resolution with just 8 GPUs</strong>, a task typically requiring at least 32 GPUs for volumetric representations at 256<sup>3</sup> resolution, thus making gigascale 3D generation both practical and accessible.
31
 
32
  ## 🌟 Highlight
33
 
@@ -39,7 +39,15 @@ Generating high-resolution 3D shapes using volumetric representations such as Si
39
 
40
  ### Installation
41
 
42
- ```sh
 
 
 
 
 
 
 
 
43
  git clone https://github.com/DreamTechAI/Direct3D-S2.git
44
 
45
  cd Direct3D-S2
@@ -100,4 +108,4 @@ If you find our work useful, please consider citing our paper:
100
  journal={arXiv preprint arXiv:2505.17412},
101
  year={2025}
102
  }
103
- ```
 
1
  ---
2
  license: mit
3
+ pipeline_tag: image-to-3d
4
  ---
5
 
 
6
  # Direct3D‑S2: Gigascale 3D Generation Made Easy with Spatial Sparse Attention
7
 
8
  <div align="center">
 
27
 
28
  ## 📝 Abstract
29
 
30
+ Generating high-resolution 3D shapes using volumetric representations such as Signed Distance Functions (SDFs) presents substantial computational and memory challenges. We introduce **Direct3D‑S2**, a scalable 3D generation framework based on sparse volumes that achieves superior output quality with dramatically reduced training costs. Our key innovation is the **Spatial Sparse Attention (SSA)** mechanism, which greatly enhances the efficiency of Diffusion Transformer (DiT) computations on sparse volumetric data. SSA allows the model to effectively process large token sets within sparse volumes, substantially reducing computational overhead and achieving a *3.9×* speedup in the forward pass and a *9.6×* speedup in the backward pass. Our framework also includes a variational autoencoder (VAE) that maintains a consistent sparse volumetric format across input, latent, and output stages. Compared to previous methods with heterogeneous representations in 3D VAE, this unified design significantly improves training efficiency and stability. Our model is trained on public available datasets, and experiments demonstrate that **Direct3D‑S2** not only surpasses state-of-the-art methods in generation quality and efficiency, but also enables **training at 1024<sup>3</sup> resolution with just 8 GPUs**, a task typically requiring at least 32 GPUs for volumetric representations at 256<sup>3</sup> resolution, thus making gigascale 3D generation both practical and accessible.
31
 
32
  ## 🌟 Highlight
33
 
 
39
 
40
  ### Installation
41
 
42
+ - Install `pytorch >= 2.1.0` and `torchvision` first. You can refer to the [official installation guide](https://pytorch.org/get-started/locally/) for more details.
43
+
44
+ ```bash
45
+ python -m pip install torch torchvision
46
+ ```
47
+
48
+ - Install dependencies:
49
+
50
+ ```bash
51
  git clone https://github.com/DreamTechAI/Direct3D-S2.git
52
 
53
  cd Direct3D-S2
 
108
  journal={arXiv preprint arXiv:2505.17412},
109
  year={2025}
110
  }
111
+ ```