wushuang98 commited on
Commit
c0ee84d
Β·
verified Β·
1 Parent(s): 2e06d49

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +103 -3
README.md CHANGED
@@ -1,3 +1,103 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+
5
+
6
+ # Direct3D‑S2: Gigascale 3D Generation Made Easy with Spatial Sparse Attention
7
+
8
+ <div align="center">
9
+ <a href=https://www.neural4d.com/research/direct3d-s2 target="_blank"><img src=https://img.shields.io/badge/Project%20Page-333399.svg?logo=googlehome height=22px></a>
10
+ <a href=https://huggingface.co/spaces/wushuang98/Direct3D-S2-v1.0-demo target="_blank"><img src=https://img.shields.io/badge/%F0%9F%A4%97%20Demo-276cb4.svg height=22px></a>
11
+ <a href=https://huggingface.co/spaces/wushuang98/Direct3D-S2-v1.0-demo target="_blank"><img src=https://img.shields.io/badge/%F0%9F%A4%97%20Models-d96902.svg height=22px></a>
12
+ <a href=https://arxiv.org/pdf/2505.17412 target="_blank"><img src=https://img.shields.io/badge/Arxiv-b5212f.svg?logo=arxiv height=22px></a>
13
+ </div>
14
+
15
+ <div style="background: #fff; box-shadow: 0 4px 12px rgba(0,0,0,.15); display: inline-block; padding: 0px;">
16
+ <img id="teaser" src="assets/teaserv6.png" alt="Teaser image of Direct3D-S2"/>
17
+ </div>
18
+
19
+ ---
20
+
21
+ ## ✨ News
22
+
23
+ - May 30, 2025: 🀯 We have released both v1.0 and v1.1. The new model offers even greater speed compared to FlashAttention-2, with **12.2Γ—** faster forward pass and **19.7Γ—** faster backward pass, resulting in nearly **2Γ—** inference speedup over v1.0.
24
+ - May 30, 2025: πŸ”¨ Release inference code and model.
25
+ - May 26, 2025: 🎁 Release live demo on πŸ€— [Hugging Face](https://huggingface.co/spaces/wushuang98/Direct3D-S2-v1.0-demo).
26
+ - May 26, 2025: πŸš€ Release paper and project page.
27
+
28
+ ## πŸ“ Abstract
29
+
30
+ Generating high-resolution 3D shapes using volumetric representations such as Signed Distance Functions (SDFs) presents substantial computational and memory challenges. We introduce <strong class="has-text-weight-bold">Direct3D‑S2</strong>, a scalable 3D generation framework based on sparse volumes that achieves superior output quality with dramatically reduced training costs. Our key innovation is the <strong class="has-text-weight-bold">Spatial Sparse Attention (SSA)</strong> mechanism, which greatly enhances the efficiency of Diffusion Transformer (DiT) computations on sparse volumetric data. SSA allows the model to effectively process large token sets within sparse volumes, substantially reducing computational overhead and achieving a <em>3.9&times;</em> speedup in the forward pass and a <em>9.6&times;</em> speedup in the backward pass. Our framework also includes a variational autoencoder (VAE) that maintains a consistent sparse volumetric format across input, latent, and output stages. Compared to previous methods with heterogeneous representations in 3D VAE, this unified design significantly improves training efficiency and stability. Our model is trained on public available datasets, and experiments demonstrate that <strong class="has-text-weight-bold">Direct3D‑S2</strong> not only surpasses state-of-the-art methods in generation quality and efficiency, but also enables <strong class="has-text-weight-bold">training at 1024<sup>3</sup> resolution with just 8 GPUs</strong>, a task typically requiring at least 32 GPUs for volumetric representations at 256<sup>3</sup> resolution, thus making gigascale 3D generation both practical and accessible.
31
+
32
+ ## 🌟 Highlight
33
+
34
+ - **Gigascale 3D Generation**: Direct3D-S2 enables training at 1024<sup>3</sup> resolution with only 8 GPUs.
35
+ - **Spatial Sparse Attention (SSA)**: A novel attention mechanism designed for sparse volumetric data, enabling efficient processing of large token sets.
36
+ - **Unified Sparse VAE**: A variational autoencoder that maintains a consistent sparse volumetric format across input, latent, and output stages, improving training efficiency and stability.
37
+
38
+ ## πŸš€ Getting Started
39
+
40
+ ### Installation
41
+
42
+ ```sh
43
+ git clone https://github.com/DreamTechAI/Direct3D-S2.git
44
+
45
+ cd Direct3D-S2
46
+
47
+ pip install -r requirements.txt
48
+
49
+ pip install -e .
50
+
51
+ ```
52
+
53
+ ### Usage
54
+
55
+ ```python
56
+ from direct3d_s2.pipeline import Direct3DS2Pipeline
57
+ pipeline = Direct3DS2Pipeline.from_pretrained(
58
+ 'wushuang98/Direct3D-S2',
59
+ subfolder="direct3d-s2-v-1-1"
60
+ ).to("cuda:0")
61
+
62
+ mesh = pipeline(
63
+ 'assets/test/13.png',
64
+ sdf_resolution=1024, # 512 or 1024
65
+ remesh=False, # Switch to True if you need to reduce the number of triangles.
66
+ )["mesh"]
67
+
68
+ mesh.export('output.obj')
69
+ ```
70
+
71
+ ### Web Demo
72
+
73
+ We provide a Gradio web demo for Direct3D-S2, which allows you to generate 3D meshes from images interactively.
74
+
75
+ ```bash
76
+ python app.py
77
+ ```
78
+
79
+ ## πŸ€— Acknowledgements
80
+
81
+ Thanks to the following repos for their great work, which helps us a lot in the development of Direct3D-S2:
82
+
83
+ - [Trellis](https://github.com/microsoft/TRELLIS)
84
+ - [SparseFlex](https://github.com/VAST-AI-Research/TripoSF)
85
+ - [native-sparse-attention-triton](https://github.com/XunhaoLai/native-sparse-attention-triton)
86
+ - [diffusers](https://github.com/huggingface/diffusers)
87
+
88
+ ## πŸ“„ License
89
+
90
+ Direct3D-S2 is released under the MIT License. See [LICENSE](LICENSE) for details.
91
+
92
+ ## πŸ“– Citation
93
+
94
+ If you find our work useful, please consider citing our paper:
95
+
96
+ ```bibtex
97
+ @article{wu2025direct3ds2gigascale3dgeneration,
98
+ title={Direct3D-S2: Gigascale 3D Generation Made Easy with Spatial Sparse Attention},
99
+ author={Shuang Wu and Youtian Lin and Feihu Zhang and Yifei Zeng and Yikang Yang and Yajie Bao and Jiachen Qian and Siyu Zhu and Philip Torr and Xun Cao and Yao Yao},
100
+ journal={arXiv preprint arXiv:2505.17412},
101
+ year={2025}
102
+ }
103
+ ```