Add modal update docs
Browse files- README.md +122 -0
- modal/README.md +48 -0
- modal/flux-trellis-GGUF-text-to-3d.py +448 -0
README.md
CHANGED
@@ -19,5 +19,127 @@ Team members: castlebbs@ stargarnet@ zinkenite@
|
|
19 |
- Recording of Gradio use: https://www.youtube.com/watch?v=JbpwDwk8IcI
|
20 |
- GitHub: https://github.com/castlebbs/gradio-mcp-hackathon
|
21 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
22 |
|
23 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
19 |
- Recording of Gradio use: https://www.youtube.com/watch?v=JbpwDwk8IcI
|
20 |
- GitHub: https://github.com/castlebbs/gradio-mcp-hackathon
|
21 |
|
22 |
+
# 🎮 3D Scene Asset Generator
|
23 |
+
### Our participation to the 2025 Gradio Agent MCP Hackathon
|
24 |
+
|
25 |
+
> Transform player biographies into personalized 3D environments using LLM-powered analysis and 3D asset generation models pipelines.
|
26 |
+
|
27 |
+
## 🌟 Project Overview
|
28 |
+
|
29 |
+
This hackathon project creates a 3D scene generator that analyzes player biographies and automatically generates personalized 3D environments. By combining the power of LLM analysis with generation models (FLUX + Trellis), we create unique, contextual 3D assets that reflect each player's personality, interests, and background.
|
30 |
+
|
31 |
+
## ✨ Key Features
|
32 |
+
|
33 |
+
- **🤖 AI-Powered Analysis**: LLM analyzes player biographies to understand personality and interests
|
34 |
+
- **🎨 3D Generation**: FLUX + Trellis pipeline generates high-quality, contextual 3D assets
|
35 |
+
- **🌐 Interactive Web Interface**: Gradio interface with real-time generation and examples
|
36 |
+
- **🔧 MCP Integration**: Supports Model Context Protocol for enhanced interactions
|
37 |
+
- **⚡ Optimized Pipeline**: Uses GGUF quantization and LoRA models for fast, efficient generation
|
38 |
+
- **📱 User-Friendly**: Simple input → AI analysis → 3D asset generation workflow
|
39 |
+
|
40 |
+
|
41 |
+
|
42 |
+
### Technology Stack
|
43 |
+
|
44 |
+
- **Frontend**: Gradio with custom CSS styling
|
45 |
+
- **AI Analysis**: Anthropic Claude Sonnet 4
|
46 |
+
- **3D Generation**: FLUX + Trellis on Modal
|
47 |
+
- **Output Format**: GLB (3D models compatible with most engines)
|
48 |
+
|
49 |
+
## 🚀 Quick Start
|
50 |
+
|
51 |
+
### Prerequisites
|
52 |
+
|
53 |
+
- Python 3.8+
|
54 |
+
- Anthropic API key
|
55 |
+
- Modal account
|
56 |
+
|
57 |
+
### Installation
|
58 |
+
|
59 |
+
1. **Clone the repository**
|
60 |
+
|
61 |
+
2. **Set up the Gradio application**
|
62 |
+
```bash
|
63 |
+
cd gradio
|
64 |
+
pip install -r requirements.txt
|
65 |
+
```
|
66 |
+
|
67 |
+
3. **Configure API keys**
|
68 |
+
```bash
|
69 |
+
export ANTHROPIC_API_KEY="your-anthropic-api-key"
|
70 |
+
```
|
71 |
+
|
72 |
+
4. **Set up Modal**
|
73 |
+
```bash
|
74 |
+
modal setup
|
75 |
+
```
|
76 |
+
|
77 |
+
5. **Deploy the Modal function**
|
78 |
+
```bash
|
79 |
+
cd ../modal
|
80 |
+
modal deploy flux-trellis-GGUF-text-to-3d.py
|
81 |
+
```
|
82 |
+
|
83 |
+
6. **Run the application**
|
84 |
+
```bash
|
85 |
+
|
86 |
+
python app.py
|
87 |
+
```
|
88 |
+
|
89 |
+
## 💡 Usage Example
|
90 |
+
|
91 |
+
**Input Biography:**
|
92 |
+
> "Marcus is a tech enthusiast and gaming streamer who loves mechanical keyboards and collecting vintage arcade games. He's also a coffee connoisseur who roasts his own beans and enjoys late-night coding sessions."
|
93 |
+
|
94 |
+
**Generated 3D Assets:**
|
95 |
+
- Vintage arcade cabinet with classic game artwork
|
96 |
+
- Premium mechanical keyboard with RGB backlighting
|
97 |
+
- Professional coffee roasting station with custom setup
|
98 |
+
- Gaming chair with LED accents and streaming equipment
|
99 |
+
- Retro-futuristic desk lamp with adjustable lighting
|
100 |
+
|
101 |
+
## 📁 Project Structure
|
102 |
+
|
103 |
+
```
|
104 |
+
/
|
105 |
+
│-─ app.py # Core application logic
|
106 |
+
│── requirements.txt # Python dependencies
|
107 |
+
│── README.md # Detailed app documentation
|
108 |
+
│── images/ # UI assets and examples
|
109 |
+
├── modal/ # Modal cloud functions
|
110 |
+
│ ├── flux-trellis-GGUF-text-to-3d.py # 3D generation pipeline
|
111 |
+
│ └── README.md # Modal setup documentation
|
112 |
+
│── godot/ # Godot test application
|
113 |
+
├── LICENSE # MIT License
|
114 |
+
└── README.md # This file
|
115 |
+
```
|
116 |
+
|
117 |
+
## 🔧 Technical Details
|
118 |
+
|
119 |
+
### AI Pipeline
|
120 |
+
- **Text Analysis**: Claude Sonnet processes biographical text to extract personality traits and interests
|
121 |
+
- **Prompt Generation**: AI creates detailed, contextual prompts for 3D asset generation
|
122 |
+
- **Asset Creation**: FLUX + Trellis pipeline generates high-quality 3D models
|
123 |
+
|
124 |
+
### Optimizations
|
125 |
+
- **GGUF Quantization**: Reduces model size while maintaining quality
|
126 |
+
- **LoRA Models**: Hyper FLUX 8Steps for faster inference, Game Assets LoRA for better 3D results
|
127 |
+
- **Modal Scaling**: Automatic scaling for concurrent requests
|
128 |
+
|
129 |
+
## 🏆 Hackathon Team
|
130 |
+
|
131 |
+
- castlebbs@
|
132 |
+
- stargarnet@
|
133 |
+
- zinkenite@
|
134 |
+
|
135 |
+
Built with ❤️ for the 2025 Gradio Agent MCP Hackathon
|
136 |
+
|
137 |
+
## Links
|
138 |
+
|
139 |
+
|
140 |
+
- https://huggingface.co/black-forest-labs/FLUX.1-dev
|
141 |
+
- https://huggingface.co/microsoft/TRELLIS-image-large
|
142 |
+
- https://huggingface.co/spaces/gokaygokay/Flux-TRELLIS Thank you to gokaygokay for the GGUF
|
143 |
+
|
144 |
|
145 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
modal/README.md
ADDED
@@ -0,0 +1,48 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
+
## Modal remote function
|
3 |
+
This code deploys the text_to_3d remote function to Modal.
|
4 |
+
|
5 |
+
### FLUX Pipeline Models
|
6 |
+
|
7 |
+
Here is a description of the different models used in this pipeline.
|
8 |
+
|
9 |
+
1. **T5 Text Encoder**
|
10 |
+
- Type: Transformer-based text encoder
|
11 |
+
- Function: Converts text prompts to embeddings
|
12 |
+
- Optimization: 8-bit quantization via BitsAndBytesConfig
|
13 |
+
|
14 |
+
2. **FLUX Transformer2D Model**
|
15 |
+
- Type: Diffusion transformer for image generation
|
16 |
+
- Function: Core image generation from text embeddings
|
17 |
+
- Optimization: GGUF q8_0 quantization (from https://huggingface.co/gokaygokay)
|
18 |
+
- LoRA Models Applied:
|
19 |
+
- **Hyper FLUX 8Steps LoRA**: Reduces inference steps from ~20-50 to 8
|
20 |
+
- **gokaygokay's Flux Game Assets LoRA**: Improves 3D asset generation quality
|
21 |
+
|
22 |
+
### TRELLIS Pipeline Models:
|
23 |
+
3. **TRELLIS-image-large**
|
24 |
+
- Type: Multi-stage 3D generation model
|
25 |
+
- Function: Converts 2D images to 3D representations
|
26 |
+
- Components:
|
27 |
+
- **Sparse Structure Sampler**: Initial 3D structure generation
|
28 |
+
- **SLAT (Structured Latent) Sampler**: 3D refinement and detail enhancement
|
29 |
+
|
30 |
+
### Supporting Models:
|
31 |
+
4. **U2NET (via rembg)**
|
32 |
+
- Type: Salient object detection model
|
33 |
+
- Function: Background removal for clean 3D generation
|
34 |
+
- Used by: rembg library in TRELLIS preprocessing
|
35 |
+
|
36 |
+
### Post-processing Components:
|
37 |
+
5. **Mesh Simplification Algorithm**
|
38 |
+
- Function: Reduces polygon count (95% reduction default)
|
39 |
+
6. **Texture Generator**
|
40 |
+
- Function: Creates 1024x1024 textures for 3D models
|
41 |
+
7. **GLB Exporter**
|
42 |
+
- Function: Combines gaussian splatting + mesh into GLB format
|
43 |
+
|
44 |
+
### Pipeline Sequence
|
45 |
+

|
46 |
+
|
47 |
+
|
48 |
+
|
modal/flux-trellis-GGUF-text-to-3d.py
ADDED
@@ -0,0 +1,448 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
"""
|
2 |
+
Modal app for FLUX + TRELLIS text-to-image-to-3D generation.
|
3 |
+
Uses GGUF from https://huggingface.co/gokaygokay: Flux Game Assets LoRA + Hyper FLUX 8Steps LoRA
|
4 |
+
We did not use the Flux Game Assets LoRA trigger word as we were happy with the results without it.
|
5 |
+
|
6 |
+
We experimented with TRELLIS text-to-3D models (TRELLIS-text-large, TRELLIS-text-xlarge), but we obtained
|
7 |
+
much better results with the image-to-3D pipeline using Flux.1-dev (with the LoRA weights applied) to generate
|
8 |
+
the initial image from prompt.
|
9 |
+
|
10 |
+
With the current memory management, this container, when staying warm uses less than 30GB of memory
|
11 |
+
and was tested successfully on A100-40 for many successive inferences without memory leaks.
|
12 |
+
On a warm container, we obtained 3D asset in less than 30 seconds which is a good result given there
|
13 |
+
is no optmization in this pipeline yet.
|
14 |
+
|
15 |
+
Created for participation in the Hugging Face 🤖 Gradio Agents & MCP Hackathon 2025 🚀
|
16 |
+
https://huggingface.co/Agents-MCP-Hackathon
|
17 |
+
|
18 |
+
Author: castlebbs
|
19 |
+
"""
|
20 |
+
|
21 |
+
import os
|
22 |
+
import modal
|
23 |
+
import gc
|
24 |
+
|
25 |
+
cuda_version = "12.1.1"
|
26 |
+
flavor = "devel"
|
27 |
+
operating_sys = "ubuntu22.04"
|
28 |
+
tag = f"{cuda_version}-{flavor}-{operating_sys}"
|
29 |
+
|
30 |
+
image = (
|
31 |
+
modal.Image.from_registry(f"nvidia/cuda:{tag}", add_python="3.10")
|
32 |
+
.apt_install([
|
33 |
+
"git", "wget", "libgl1-mesa-glx", "libglib2.0-0", "libsm6",
|
34 |
+
"libxext6", "libxrender-dev", "libgomp1", "libjpeg-dev",
|
35 |
+
"build-essential", "ninja-build", "cmake",
|
36 |
+
])
|
37 |
+
.env({
|
38 |
+
"CUDA_HOME": "/usr/local/cuda-12.1",
|
39 |
+
"PYTHONPATH": "/trellis",
|
40 |
+
"TORCH_CUDA_ARCH_LIST": "8.0",
|
41 |
+
"SPCONV_ALGO": "native",
|
42 |
+
})
|
43 |
+
.pip_install(
|
44 |
+
["torch==2.4.0", "torchvision==0.19.0", "torchaudio==2.4.0"],
|
45 |
+
extra_options="--index-url https://download.pytorch.org/whl/cu121",
|
46 |
+
)
|
47 |
+
.pip_install([
|
48 |
+
"packaging", "wheel", "setuptools", "pillow", "imageio", "imageio-ffmpeg",
|
49 |
+
"tqdm", "easydict", "opencv-python-headless", "scipy", "ninja", "rembg",
|
50 |
+
"onnxruntime", "trimesh", "open3d", "xatlas", "pyvista", "pymeshfix",
|
51 |
+
"igraph", "transformers", "accelerate", "safetensors", "fastapi[standard]",
|
52 |
+
"diffusers>=0.30.0", "bitsandbytes", "sentencepiece", "peft", "gguf",
|
53 |
+
])
|
54 |
+
.pip_install([
|
55 |
+
"https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.0.post2/flash_attn-2.7.0.post2+cu12torch2.4cxx11abiFALSE-cp310-cp310-linux_x86_64.whl"
|
56 |
+
])
|
57 |
+
.pip_install(["git+https://github.com/EasternJournalist/utils3d.git@9a4eb15e4021b67b12c460c7057d642626897ec8"])
|
58 |
+
.pip_install(["kaolin"], extra_options="-f https://nvidia-kaolin.s3.us-east-2.amazonaws.com/torch-2.4.0_cu121.html")
|
59 |
+
.pip_install(["xformers==0.0.27.post2"], extra_options="--index-url https://download.pytorch.org/whl/cu121")
|
60 |
+
.pip_install(["spconv-cu121"])
|
61 |
+
.pip_install(["git+https://github.com/NVlabs/nvdiffrast.git"])
|
62 |
+
.run_commands([
|
63 |
+
"mkdir -p /tmp/extensions",
|
64 |
+
"git clone https://github.com/autonomousvision/mip-splatting.git /tmp/extensions/mip-splatting",
|
65 |
+
"export CXX=g++ && export CC=gcc && pip install /tmp/extensions/mip-splatting/submodules/diff-gaussian-rasterization/",
|
66 |
+
"git clone --recurse-submodules https://github.com/microsoft/TRELLIS.git /trellis",
|
67 |
+
])
|
68 |
+
.workdir("/trellis")
|
69 |
+
)
|
70 |
+
|
71 |
+
app = modal.App("flux-trellis-gguf-3d-pipeline", image=image)
|
72 |
+
|
73 |
+
models_volume = modal.Volume.from_name("trellis-models", create_if_missing=True)
|
74 |
+
torch_hub_volume = modal.Volume.from_name("torch-hub-cache", create_if_missing=True)
|
75 |
+
rembg_volume = modal.Volume.from_name("rembg-models", create_if_missing=True)
|
76 |
+
|
77 |
+
# Global variables to cache pipelines
|
78 |
+
_flux_pipeline = None
|
79 |
+
_trellis_pipeline = None
|
80 |
+
_generation_count = 0
|
81 |
+
|
82 |
+
def cleanup_memory():
|
83 |
+
"""Aggressive memory cleanup without deleting pipelines"""
|
84 |
+
global _flux_pipeline, _trellis_pipeline
|
85 |
+
|
86 |
+
try:
|
87 |
+
import torch
|
88 |
+
except ImportError:
|
89 |
+
return
|
90 |
+
|
91 |
+
# Clear Flux pipeline caches
|
92 |
+
if _flux_pipeline is not None:
|
93 |
+
# Clear transformer attention caches
|
94 |
+
if hasattr(_flux_pipeline, 'transformer'):
|
95 |
+
if hasattr(_flux_pipeline.transformer, 'clear_cache'):
|
96 |
+
_flux_pipeline.transformer.clear_cache()
|
97 |
+
|
98 |
+
# Clear scheduler state
|
99 |
+
if hasattr(_flux_pipeline, 'scheduler'):
|
100 |
+
_flux_pipeline.scheduler.timesteps = None
|
101 |
+
if hasattr(_flux_pipeline.scheduler, 'sigmas'):
|
102 |
+
_flux_pipeline.scheduler.sigmas = None
|
103 |
+
|
104 |
+
# Clear TRELLIS pipeline caches
|
105 |
+
if _trellis_pipeline is not None:
|
106 |
+
# Clear any cached preprocessor states
|
107 |
+
if hasattr(_trellis_pipeline, 'image_processor'):
|
108 |
+
# Force clear any rembg cached tensors
|
109 |
+
try:
|
110 |
+
import rembg
|
111 |
+
if hasattr(rembg, 'bg_remover'):
|
112 |
+
rembg.bg_remover = None
|
113 |
+
except ImportError:
|
114 |
+
pass
|
115 |
+
|
116 |
+
# Force garbage collection
|
117 |
+
gc.collect()
|
118 |
+
|
119 |
+
# CUDA cleanup
|
120 |
+
if torch.cuda.is_available():
|
121 |
+
torch.cuda.empty_cache()
|
122 |
+
torch.cuda.synchronize()
|
123 |
+
torch.cuda.empty_cache()
|
124 |
+
|
125 |
+
def periodic_pipeline_reset():
|
126 |
+
"""Reset pipelines every 100 generations to prevent memory accumulation"""
|
127 |
+
global _flux_pipeline, _trellis_pipeline, _generation_count
|
128 |
+
|
129 |
+
_generation_count += 1
|
130 |
+
|
131 |
+
if _generation_count % 100 == 0:
|
132 |
+
print(f"Performing periodic pipeline reset after {_generation_count} generations")
|
133 |
+
|
134 |
+
# Delete pipelines
|
135 |
+
if _flux_pipeline is not None:
|
136 |
+
del _flux_pipeline
|
137 |
+
_flux_pipeline = None
|
138 |
+
|
139 |
+
if _trellis_pipeline is not None:
|
140 |
+
del _trellis_pipeline
|
141 |
+
_trellis_pipeline = None
|
142 |
+
|
143 |
+
# Aggressive cleanup
|
144 |
+
gc.collect()
|
145 |
+
|
146 |
+
# Import torch for CUDA cleanup
|
147 |
+
try:
|
148 |
+
import torch
|
149 |
+
if torch.cuda.is_available():
|
150 |
+
torch.cuda.empty_cache()
|
151 |
+
torch.cuda.synchronize()
|
152 |
+
except ImportError:
|
153 |
+
pass
|
154 |
+
|
155 |
+
print("Pipeline reset completed")
|
156 |
+
|
157 |
+
def get_flux_pipeline():
|
158 |
+
"""Get or initialize Flux pipeline"""
|
159 |
+
global _flux_pipeline
|
160 |
+
|
161 |
+
if _flux_pipeline is None:
|
162 |
+
import torch
|
163 |
+
from diffusers import FluxPipeline, FluxTransformer2DModel, GGUFQuantizationConfig
|
164 |
+
from transformers import T5EncoderModel, BitsAndBytesConfig
|
165 |
+
|
166 |
+
print("Loading Flux pipeline...")
|
167 |
+
dtype = torch.bfloat16
|
168 |
+
|
169 |
+
file_url = "https://huggingface.co/gokaygokay/flux-game/blob/main/hyperflux_00001_.q8_0.gguf"
|
170 |
+
single_file_base_model = "camenduru/FLUX.1-dev-diffusers"
|
171 |
+
|
172 |
+
quantization_config = BitsAndBytesConfig(
|
173 |
+
load_in_8bit=True, bnb_8bit_compute_dtype=torch.bfloat16
|
174 |
+
)
|
175 |
+
text_encoder_2 = T5EncoderModel.from_pretrained(
|
176 |
+
single_file_base_model,
|
177 |
+
subfolder="text_encoder_2",
|
178 |
+
torch_dtype=dtype,
|
179 |
+
quantization_config=quantization_config,
|
180 |
+
cache_dir="/models/hf_cache",
|
181 |
+
)
|
182 |
+
|
183 |
+
# Cache the GGUF file locally first
|
184 |
+
gguf_cache_path = "/models/gguf_cache/hyperflux_00001_.q8_0.gguf"
|
185 |
+
os.makedirs("/models/gguf_cache", exist_ok=True)
|
186 |
+
|
187 |
+
# Only download if not already cached
|
188 |
+
if not os.path.exists(gguf_cache_path):
|
189 |
+
print("Downloading GGUF file to cache...")
|
190 |
+
from huggingface_hub import hf_hub_download
|
191 |
+
downloaded_path = hf_hub_download(
|
192 |
+
repo_id="gokaygokay/flux-game",
|
193 |
+
filename="hyperflux_00001_.q8_0.gguf",
|
194 |
+
cache_dir="/models/hf_cache"
|
195 |
+
)
|
196 |
+
# Copy to our persistent location
|
197 |
+
import shutil
|
198 |
+
shutil.copy2(downloaded_path, gguf_cache_path)
|
199 |
+
print(f"GGUF file cached at {gguf_cache_path}")
|
200 |
+
else:
|
201 |
+
print(f"Using cached GGUF file at {gguf_cache_path}")
|
202 |
+
|
203 |
+
transformer = FluxTransformer2DModel.from_single_file(
|
204 |
+
gguf_cache_path,
|
205 |
+
subfolder="transformer",
|
206 |
+
quantization_config=GGUFQuantizationConfig(compute_dtype=dtype),
|
207 |
+
torch_dtype=dtype,
|
208 |
+
config=single_file_base_model,
|
209 |
+
)
|
210 |
+
|
211 |
+
_flux_pipeline = FluxPipeline.from_pretrained(
|
212 |
+
single_file_base_model,
|
213 |
+
transformer=transformer,
|
214 |
+
text_encoder_2=text_encoder_2,
|
215 |
+
torch_dtype=dtype,
|
216 |
+
)
|
217 |
+
_flux_pipeline.to("cuda")
|
218 |
+
print("Flux pipeline loaded")
|
219 |
+
|
220 |
+
return _flux_pipeline
|
221 |
+
|
222 |
+
def get_trellis_pipeline(trellis_model_name):
|
223 |
+
"""Get or initialize TRELLIS pipeline"""
|
224 |
+
global _trellis_pipeline
|
225 |
+
|
226 |
+
if _trellis_pipeline is None:
|
227 |
+
print(f"Loading TRELLIS pipeline: {trellis_model_name}")
|
228 |
+
from trellis.pipelines import TrellisImageTo3DPipeline
|
229 |
+
|
230 |
+
_trellis_pipeline = TrellisImageTo3DPipeline.from_pretrained(trellis_model_name)
|
231 |
+
_trellis_pipeline.cuda()
|
232 |
+
print("TRELLIS pipeline loaded")
|
233 |
+
|
234 |
+
return _trellis_pipeline
|
235 |
+
|
236 |
+
@app.function(
|
237 |
+
gpu="A100",
|
238 |
+
volumes={
|
239 |
+
"/models": models_volume,
|
240 |
+
"/cache/torch_hub": torch_hub_volume,
|
241 |
+
"/cache/rembg": rembg_volume,
|
242 |
+
},
|
243 |
+
secrets=[modal.Secret.from_name("huggingface")],
|
244 |
+
scaledown_window=300,
|
245 |
+
timeout=3600,
|
246 |
+
memory=32768,
|
247 |
+
enable_memory_snapshot=True,
|
248 |
+
)
|
249 |
+
def text_to_3d(
|
250 |
+
text_prompt: str,
|
251 |
+
trellis_model_name: str = "JeffreyXiang/TRELLIS-image-large",
|
252 |
+
seed: int = 1,
|
253 |
+
image_width: int = 1024,
|
254 |
+
image_height: int = 1024,
|
255 |
+
guidance_scale: float = 3.5,
|
256 |
+
num_inference_steps: int = 8,
|
257 |
+
ss_guidance_strength: float = 7.5,
|
258 |
+
ss_sampling_steps: int = 12,
|
259 |
+
slat_guidance_strength: float = 3.0,
|
260 |
+
slat_sampling_steps: int = 12,
|
261 |
+
mesh_simplify: float = 0.95,
|
262 |
+
texture_size: int = 1024,
|
263 |
+
) -> dict:
|
264 |
+
"""Generate 3D assets from text prompts using FLUX + TRELLIS pipeline."""
|
265 |
+
|
266 |
+
import torch
|
267 |
+
import numpy as np
|
268 |
+
from PIL import Image
|
269 |
+
from trellis.utils import postprocessing_utils
|
270 |
+
from io import BytesIO
|
271 |
+
|
272 |
+
# Set environment variables
|
273 |
+
os.environ.update({
|
274 |
+
"SPCONV_ALGO": "native",
|
275 |
+
"ATTN_BACKEND": "flash-attn",
|
276 |
+
"TORCH_CUDA_ARCH_LIST": "8.0",
|
277 |
+
"HF_HOME": "/models/hf_cache",
|
278 |
+
"HF_DATASETS_CACHE": "/models/hf_cache",
|
279 |
+
"HF_HUB_CACHE": "/models/hf_cache",
|
280 |
+
"TORCH_HOME": "/cache/torch_hub",
|
281 |
+
"U2NET_HOME": "/cache/rembg",
|
282 |
+
})
|
283 |
+
|
284 |
+
# Create cache directories
|
285 |
+
for path in ["/models/hf_cache", "/cache/torch_hub", "/cache/rembg"]:
|
286 |
+
os.makedirs(path, exist_ok=True)
|
287 |
+
|
288 |
+
print("Starting pipeline initialization...")
|
289 |
+
|
290 |
+
# Check if we need periodic reset
|
291 |
+
periodic_pipeline_reset()
|
292 |
+
|
293 |
+
# Memory tracking
|
294 |
+
initial_memory = torch.cuda.memory_allocated() if torch.cuda.is_available() else 0
|
295 |
+
print(f"Initial GPU memory: {initial_memory / 1e9:.2f}GB")
|
296 |
+
|
297 |
+
try:
|
298 |
+
# Get pipelines (will load if not cached)
|
299 |
+
flux_pipeline = get_flux_pipeline()
|
300 |
+
trellis_pipeline = get_trellis_pipeline(trellis_model_name)
|
301 |
+
|
302 |
+
# Generate image with explicit memory management
|
303 |
+
print(f"Generating image from prompt: '{text_prompt}'")
|
304 |
+
device = "cuda"
|
305 |
+
|
306 |
+
with torch.no_grad():
|
307 |
+
generator = torch.Generator(device=device).manual_seed(seed)
|
308 |
+
|
309 |
+
generated_image = flux_pipeline(
|
310 |
+
prompt=text_prompt,
|
311 |
+
guidance_scale=guidance_scale,
|
312 |
+
num_inference_steps=num_inference_steps,
|
313 |
+
width=image_width,
|
314 |
+
height=image_height,
|
315 |
+
generator=generator,
|
316 |
+
).images[0]
|
317 |
+
|
318 |
+
# Clear generator and intermediate tensors
|
319 |
+
del generator
|
320 |
+
cleanup_memory()
|
321 |
+
|
322 |
+
print("Image generation completed successfully")
|
323 |
+
|
324 |
+
# Preprocess image for TRELLIS
|
325 |
+
print("Preprocessing image for 3D generation...")
|
326 |
+
with torch.no_grad():
|
327 |
+
processed_image = trellis_pipeline.preprocess_image(generated_image)
|
328 |
+
|
329 |
+
cleanup_memory() # Clear rembg intermediate tensors
|
330 |
+
print("Image preprocessing completed")
|
331 |
+
|
332 |
+
# Generate 3D from image
|
333 |
+
print("Generating 3D asset from image...")
|
334 |
+
outputs = trellis_pipeline.run(
|
335 |
+
processed_image,
|
336 |
+
seed=seed,
|
337 |
+
formats=["gaussian", "mesh"],
|
338 |
+
preprocess_image=False,
|
339 |
+
sparse_structure_sampler_params={
|
340 |
+
"steps": ss_sampling_steps,
|
341 |
+
"cfg_strength": ss_guidance_strength,
|
342 |
+
},
|
343 |
+
slat_sampler_params={
|
344 |
+
"steps": slat_sampling_steps,
|
345 |
+
"cfg_strength": slat_guidance_strength,
|
346 |
+
},
|
347 |
+
)
|
348 |
+
|
349 |
+
cleanup_memory() # Clear 3D generation intermediate tensors
|
350 |
+
print("3D generation completed successfully")
|
351 |
+
|
352 |
+
# Prepare result
|
353 |
+
result = {
|
354 |
+
"text_prompt": text_prompt,
|
355 |
+
"seed": seed,
|
356 |
+
"trellis_model_name": trellis_model_name,
|
357 |
+
"image_generation_params": {
|
358 |
+
"width": image_width,
|
359 |
+
"height": image_height,
|
360 |
+
"guidance_scale": guidance_scale,
|
361 |
+
"num_inference_steps": num_inference_steps,
|
362 |
+
},
|
363 |
+
"3d_generation_params": {
|
364 |
+
"ss_guidance_strength": ss_guidance_strength,
|
365 |
+
"ss_sampling_steps": ss_sampling_steps,
|
366 |
+
"slat_guidance_strength": slat_guidance_strength,
|
367 |
+
"slat_sampling_steps": slat_sampling_steps,
|
368 |
+
},
|
369 |
+
}
|
370 |
+
|
371 |
+
# Save generated image
|
372 |
+
img_buffer = BytesIO()
|
373 |
+
generated_image.save(img_buffer, format="PNG")
|
374 |
+
result["generated_image"] = img_buffer.getvalue()
|
375 |
+
|
376 |
+
# Generate GLB file
|
377 |
+
if outputs.get("gaussian") and outputs.get("mesh"):
|
378 |
+
print("Generating GLB file...")
|
379 |
+
glb = postprocessing_utils.to_glb(
|
380 |
+
outputs["gaussian"][0],
|
381 |
+
outputs["mesh"][0],
|
382 |
+
simplify=mesh_simplify,
|
383 |
+
texture_size=texture_size,
|
384 |
+
)
|
385 |
+
result["glb_file"] = glb.export(file_type="glb")
|
386 |
+
print("GLB generation completed successfully")
|
387 |
+
else:
|
388 |
+
print("Warning: Both gaussian and mesh outputs required for GLB generation")
|
389 |
+
|
390 |
+
# Final cleanup
|
391 |
+
cleanup_memory()
|
392 |
+
|
393 |
+
final_memory = torch.cuda.memory_allocated() if torch.cuda.is_available() else 0
|
394 |
+
print(f"Final GPU memory: {final_memory / 1e9:.2f}GB")
|
395 |
+
print(f"Memory delta: {(final_memory - initial_memory) / 1e6:.1f}MB")
|
396 |
+
|
397 |
+
return result
|
398 |
+
|
399 |
+
except Exception as e:
|
400 |
+
print(f"Error during generation: {e}")
|
401 |
+
cleanup_memory()
|
402 |
+
raise
|
403 |
+
|
404 |
+
@app.local_entrypoint()
|
405 |
+
def main(
|
406 |
+
text_prompt: str = "A isometric 3D dragon with two heads, white background",
|
407 |
+
trellis_model_name: str = "JeffreyXiang/TRELLIS-image-large",
|
408 |
+
seed: int = 1,
|
409 |
+
):
|
410 |
+
"""Local entrypoint for testing the text-to-image-to-3D generation."""
|
411 |
+
print(f"Starting text-to-image-to-3D generation...")
|
412 |
+
print(f"Prompt: {text_prompt}")
|
413 |
+
print(f"TRELLIS Model: {trellis_model_name}")
|
414 |
+
print(f"Seed: {seed}")
|
415 |
+
|
416 |
+
result = text_to_3d.remote(
|
417 |
+
text_prompt=text_prompt,
|
418 |
+
trellis_model_name=trellis_model_name,
|
419 |
+
seed=seed,
|
420 |
+
)
|
421 |
+
|
422 |
+
print(f"Generation completed!")
|
423 |
+
print(f"Result keys: {list(result.keys())}")
|
424 |
+
|
425 |
+
import os
|
426 |
+
output_dir = "modal_outputs"
|
427 |
+
os.makedirs(output_dir, exist_ok=True)
|
428 |
+
|
429 |
+
if "generated_image" in result:
|
430 |
+
with open(os.path.join(output_dir, "generated_image.png"), "wb") as f:
|
431 |
+
f.write(result["generated_image"])
|
432 |
+
print(f"Saved: {output_dir}/generated_image.png")
|
433 |
+
|
434 |
+
if "glb_file" in result:
|
435 |
+
with open(os.path.join(output_dir, "model.glb"), "wb") as f:
|
436 |
+
f.write(result["glb_file"])
|
437 |
+
print(f"Saved: {output_dir}/model.glb")
|
438 |
+
|
439 |
+
return result
|
440 |
+
|
441 |
+
if __name__ == "__main__":
|
442 |
+
import sys
|
443 |
+
prompt = sys.argv[1] if len(sys.argv) > 1 else "A isometric 3D dragon with two heads, white background"
|
444 |
+
model = sys.argv[2] if len(sys.argv) > 2 else "JeffreyXiang/TRELLIS-image-large"
|
445 |
+
seed = int(sys.argv[3]) if len(sys.argv) > 3 else 1
|
446 |
+
|
447 |
+
with app.run():
|
448 |
+
main(prompt, model, seed)
|