castlebbs commited on
Commit
ba63423
·
1 Parent(s): 2e9f06e

Add modal update docs

Browse files
Files changed (3) hide show
  1. README.md +122 -0
  2. modal/README.md +48 -0
  3. modal/flux-trellis-GGUF-text-to-3d.py +448 -0
README.md CHANGED
@@ -19,5 +19,127 @@ Team members: castlebbs@ stargarnet@ zinkenite@
19
  - Recording of Gradio use: https://www.youtube.com/watch?v=JbpwDwk8IcI
20
  - GitHub: https://github.com/castlebbs/gradio-mcp-hackathon
21
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
 
23
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
19
  - Recording of Gradio use: https://www.youtube.com/watch?v=JbpwDwk8IcI
20
  - GitHub: https://github.com/castlebbs/gradio-mcp-hackathon
21
 
22
+ # 🎮 3D Scene Asset Generator
23
+ ### Our participation to the 2025 Gradio Agent MCP Hackathon
24
+
25
+ > Transform player biographies into personalized 3D environments using LLM-powered analysis and 3D asset generation models pipelines.
26
+
27
+ ## 🌟 Project Overview
28
+
29
+ This hackathon project creates a 3D scene generator that analyzes player biographies and automatically generates personalized 3D environments. By combining the power of LLM analysis with generation models (FLUX + Trellis), we create unique, contextual 3D assets that reflect each player's personality, interests, and background.
30
+
31
+ ## ✨ Key Features
32
+
33
+ - **🤖 AI-Powered Analysis**: LLM analyzes player biographies to understand personality and interests
34
+ - **🎨 3D Generation**: FLUX + Trellis pipeline generates high-quality, contextual 3D assets
35
+ - **🌐 Interactive Web Interface**: Gradio interface with real-time generation and examples
36
+ - **🔧 MCP Integration**: Supports Model Context Protocol for enhanced interactions
37
+ - **⚡ Optimized Pipeline**: Uses GGUF quantization and LoRA models for fast, efficient generation
38
+ - **📱 User-Friendly**: Simple input → AI analysis → 3D asset generation workflow
39
+
40
+
41
+
42
+ ### Technology Stack
43
+
44
+ - **Frontend**: Gradio with custom CSS styling
45
+ - **AI Analysis**: Anthropic Claude Sonnet 4
46
+ - **3D Generation**: FLUX + Trellis on Modal
47
+ - **Output Format**: GLB (3D models compatible with most engines)
48
+
49
+ ## 🚀 Quick Start
50
+
51
+ ### Prerequisites
52
+
53
+ - Python 3.8+
54
+ - Anthropic API key
55
+ - Modal account
56
+
57
+ ### Installation
58
+
59
+ 1. **Clone the repository**
60
+
61
+ 2. **Set up the Gradio application**
62
+ ```bash
63
+ cd gradio
64
+ pip install -r requirements.txt
65
+ ```
66
+
67
+ 3. **Configure API keys**
68
+ ```bash
69
+ export ANTHROPIC_API_KEY="your-anthropic-api-key"
70
+ ```
71
+
72
+ 4. **Set up Modal**
73
+ ```bash
74
+ modal setup
75
+ ```
76
+
77
+ 5. **Deploy the Modal function**
78
+ ```bash
79
+ cd ../modal
80
+ modal deploy flux-trellis-GGUF-text-to-3d.py
81
+ ```
82
+
83
+ 6. **Run the application**
84
+ ```bash
85
+
86
+ python app.py
87
+ ```
88
+
89
+ ## 💡 Usage Example
90
+
91
+ **Input Biography:**
92
+ > "Marcus is a tech enthusiast and gaming streamer who loves mechanical keyboards and collecting vintage arcade games. He's also a coffee connoisseur who roasts his own beans and enjoys late-night coding sessions."
93
+
94
+ **Generated 3D Assets:**
95
+ - Vintage arcade cabinet with classic game artwork
96
+ - Premium mechanical keyboard with RGB backlighting
97
+ - Professional coffee roasting station with custom setup
98
+ - Gaming chair with LED accents and streaming equipment
99
+ - Retro-futuristic desk lamp with adjustable lighting
100
+
101
+ ## 📁 Project Structure
102
+
103
+ ```
104
+ /
105
+ │-─ app.py # Core application logic
106
+ │── requirements.txt # Python dependencies
107
+ │── README.md # Detailed app documentation
108
+ │── images/ # UI assets and examples
109
+ ├── modal/ # Modal cloud functions
110
+ │ ├── flux-trellis-GGUF-text-to-3d.py # 3D generation pipeline
111
+ │ └── README.md # Modal setup documentation
112
+ │── godot/ # Godot test application
113
+ ├── LICENSE # MIT License
114
+ └── README.md # This file
115
+ ```
116
+
117
+ ## 🔧 Technical Details
118
+
119
+ ### AI Pipeline
120
+ - **Text Analysis**: Claude Sonnet processes biographical text to extract personality traits and interests
121
+ - **Prompt Generation**: AI creates detailed, contextual prompts for 3D asset generation
122
+ - **Asset Creation**: FLUX + Trellis pipeline generates high-quality 3D models
123
+
124
+ ### Optimizations
125
+ - **GGUF Quantization**: Reduces model size while maintaining quality
126
+ - **LoRA Models**: Hyper FLUX 8Steps for faster inference, Game Assets LoRA for better 3D results
127
+ - **Modal Scaling**: Automatic scaling for concurrent requests
128
+
129
+ ## 🏆 Hackathon Team
130
+
131
+ - castlebbs@
132
+ - stargarnet@
133
+ - zinkenite@
134
+
135
+ Built with ❤️ for the 2025 Gradio Agent MCP Hackathon
136
+
137
+ ## Links
138
+
139
+
140
+ - https://huggingface.co/black-forest-labs/FLUX.1-dev
141
+ - https://huggingface.co/microsoft/TRELLIS-image-large
142
+ - https://huggingface.co/spaces/gokaygokay/Flux-TRELLIS Thank you to gokaygokay for the GGUF
143
+
144
 
145
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
modal/README.md ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ## Modal remote function
3
+ This code deploys the text_to_3d remote function to Modal.
4
+
5
+ ### FLUX Pipeline Models
6
+
7
+ Here is a description of the different models used in this pipeline.
8
+
9
+ 1. **T5 Text Encoder**
10
+ - Type: Transformer-based text encoder
11
+ - Function: Converts text prompts to embeddings
12
+ - Optimization: 8-bit quantization via BitsAndBytesConfig
13
+
14
+ 2. **FLUX Transformer2D Model**
15
+ - Type: Diffusion transformer for image generation
16
+ - Function: Core image generation from text embeddings
17
+ - Optimization: GGUF q8_0 quantization (from https://huggingface.co/gokaygokay)
18
+ - LoRA Models Applied:
19
+ - **Hyper FLUX 8Steps LoRA**: Reduces inference steps from ~20-50 to 8
20
+ - **gokaygokay's Flux Game Assets LoRA**: Improves 3D asset generation quality
21
+
22
+ ### TRELLIS Pipeline Models:
23
+ 3. **TRELLIS-image-large**
24
+ - Type: Multi-stage 3D generation model
25
+ - Function: Converts 2D images to 3D representations
26
+ - Components:
27
+ - **Sparse Structure Sampler**: Initial 3D structure generation
28
+ - **SLAT (Structured Latent) Sampler**: 3D refinement and detail enhancement
29
+
30
+ ### Supporting Models:
31
+ 4. **U2NET (via rembg)**
32
+ - Type: Salient object detection model
33
+ - Function: Background removal for clean 3D generation
34
+ - Used by: rembg library in TRELLIS preprocessing
35
+
36
+ ### Post-processing Components:
37
+ 5. **Mesh Simplification Algorithm**
38
+ - Function: Reduces polygon count (95% reduction default)
39
+ 6. **Texture Generator**
40
+ - Function: Creates 1024x1024 textures for 3D models
41
+ 7. **GLB Exporter**
42
+ - Function: Combines gaussian splatting + mesh into GLB format
43
+
44
+ ### Pipeline Sequence
45
+ ![pipeline-sequence](https://github.com/user-attachments/assets/5268a95f-8b6a-48e3-8bcc-86028b69ab46)
46
+
47
+
48
+
modal/flux-trellis-GGUF-text-to-3d.py ADDED
@@ -0,0 +1,448 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Modal app for FLUX + TRELLIS text-to-image-to-3D generation.
3
+ Uses GGUF from https://huggingface.co/gokaygokay: Flux Game Assets LoRA + Hyper FLUX 8Steps LoRA
4
+ We did not use the Flux Game Assets LoRA trigger word as we were happy with the results without it.
5
+
6
+ We experimented with TRELLIS text-to-3D models (TRELLIS-text-large, TRELLIS-text-xlarge), but we obtained
7
+ much better results with the image-to-3D pipeline using Flux.1-dev (with the LoRA weights applied) to generate
8
+ the initial image from prompt.
9
+
10
+ With the current memory management, this container, when staying warm uses less than 30GB of memory
11
+ and was tested successfully on A100-40 for many successive inferences without memory leaks.
12
+ On a warm container, we obtained 3D asset in less than 30 seconds which is a good result given there
13
+ is no optmization in this pipeline yet.
14
+
15
+ Created for participation in the Hugging Face 🤖 Gradio Agents & MCP Hackathon 2025 🚀
16
+ https://huggingface.co/Agents-MCP-Hackathon
17
+
18
+ Author: castlebbs
19
+ """
20
+
21
+ import os
22
+ import modal
23
+ import gc
24
+
25
+ cuda_version = "12.1.1"
26
+ flavor = "devel"
27
+ operating_sys = "ubuntu22.04"
28
+ tag = f"{cuda_version}-{flavor}-{operating_sys}"
29
+
30
+ image = (
31
+ modal.Image.from_registry(f"nvidia/cuda:{tag}", add_python="3.10")
32
+ .apt_install([
33
+ "git", "wget", "libgl1-mesa-glx", "libglib2.0-0", "libsm6",
34
+ "libxext6", "libxrender-dev", "libgomp1", "libjpeg-dev",
35
+ "build-essential", "ninja-build", "cmake",
36
+ ])
37
+ .env({
38
+ "CUDA_HOME": "/usr/local/cuda-12.1",
39
+ "PYTHONPATH": "/trellis",
40
+ "TORCH_CUDA_ARCH_LIST": "8.0",
41
+ "SPCONV_ALGO": "native",
42
+ })
43
+ .pip_install(
44
+ ["torch==2.4.0", "torchvision==0.19.0", "torchaudio==2.4.0"],
45
+ extra_options="--index-url https://download.pytorch.org/whl/cu121",
46
+ )
47
+ .pip_install([
48
+ "packaging", "wheel", "setuptools", "pillow", "imageio", "imageio-ffmpeg",
49
+ "tqdm", "easydict", "opencv-python-headless", "scipy", "ninja", "rembg",
50
+ "onnxruntime", "trimesh", "open3d", "xatlas", "pyvista", "pymeshfix",
51
+ "igraph", "transformers", "accelerate", "safetensors", "fastapi[standard]",
52
+ "diffusers>=0.30.0", "bitsandbytes", "sentencepiece", "peft", "gguf",
53
+ ])
54
+ .pip_install([
55
+ "https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.0.post2/flash_attn-2.7.0.post2+cu12torch2.4cxx11abiFALSE-cp310-cp310-linux_x86_64.whl"
56
+ ])
57
+ .pip_install(["git+https://github.com/EasternJournalist/utils3d.git@9a4eb15e4021b67b12c460c7057d642626897ec8"])
58
+ .pip_install(["kaolin"], extra_options="-f https://nvidia-kaolin.s3.us-east-2.amazonaws.com/torch-2.4.0_cu121.html")
59
+ .pip_install(["xformers==0.0.27.post2"], extra_options="--index-url https://download.pytorch.org/whl/cu121")
60
+ .pip_install(["spconv-cu121"])
61
+ .pip_install(["git+https://github.com/NVlabs/nvdiffrast.git"])
62
+ .run_commands([
63
+ "mkdir -p /tmp/extensions",
64
+ "git clone https://github.com/autonomousvision/mip-splatting.git /tmp/extensions/mip-splatting",
65
+ "export CXX=g++ && export CC=gcc && pip install /tmp/extensions/mip-splatting/submodules/diff-gaussian-rasterization/",
66
+ "git clone --recurse-submodules https://github.com/microsoft/TRELLIS.git /trellis",
67
+ ])
68
+ .workdir("/trellis")
69
+ )
70
+
71
+ app = modal.App("flux-trellis-gguf-3d-pipeline", image=image)
72
+
73
+ models_volume = modal.Volume.from_name("trellis-models", create_if_missing=True)
74
+ torch_hub_volume = modal.Volume.from_name("torch-hub-cache", create_if_missing=True)
75
+ rembg_volume = modal.Volume.from_name("rembg-models", create_if_missing=True)
76
+
77
+ # Global variables to cache pipelines
78
+ _flux_pipeline = None
79
+ _trellis_pipeline = None
80
+ _generation_count = 0
81
+
82
+ def cleanup_memory():
83
+ """Aggressive memory cleanup without deleting pipelines"""
84
+ global _flux_pipeline, _trellis_pipeline
85
+
86
+ try:
87
+ import torch
88
+ except ImportError:
89
+ return
90
+
91
+ # Clear Flux pipeline caches
92
+ if _flux_pipeline is not None:
93
+ # Clear transformer attention caches
94
+ if hasattr(_flux_pipeline, 'transformer'):
95
+ if hasattr(_flux_pipeline.transformer, 'clear_cache'):
96
+ _flux_pipeline.transformer.clear_cache()
97
+
98
+ # Clear scheduler state
99
+ if hasattr(_flux_pipeline, 'scheduler'):
100
+ _flux_pipeline.scheduler.timesteps = None
101
+ if hasattr(_flux_pipeline.scheduler, 'sigmas'):
102
+ _flux_pipeline.scheduler.sigmas = None
103
+
104
+ # Clear TRELLIS pipeline caches
105
+ if _trellis_pipeline is not None:
106
+ # Clear any cached preprocessor states
107
+ if hasattr(_trellis_pipeline, 'image_processor'):
108
+ # Force clear any rembg cached tensors
109
+ try:
110
+ import rembg
111
+ if hasattr(rembg, 'bg_remover'):
112
+ rembg.bg_remover = None
113
+ except ImportError:
114
+ pass
115
+
116
+ # Force garbage collection
117
+ gc.collect()
118
+
119
+ # CUDA cleanup
120
+ if torch.cuda.is_available():
121
+ torch.cuda.empty_cache()
122
+ torch.cuda.synchronize()
123
+ torch.cuda.empty_cache()
124
+
125
+ def periodic_pipeline_reset():
126
+ """Reset pipelines every 100 generations to prevent memory accumulation"""
127
+ global _flux_pipeline, _trellis_pipeline, _generation_count
128
+
129
+ _generation_count += 1
130
+
131
+ if _generation_count % 100 == 0:
132
+ print(f"Performing periodic pipeline reset after {_generation_count} generations")
133
+
134
+ # Delete pipelines
135
+ if _flux_pipeline is not None:
136
+ del _flux_pipeline
137
+ _flux_pipeline = None
138
+
139
+ if _trellis_pipeline is not None:
140
+ del _trellis_pipeline
141
+ _trellis_pipeline = None
142
+
143
+ # Aggressive cleanup
144
+ gc.collect()
145
+
146
+ # Import torch for CUDA cleanup
147
+ try:
148
+ import torch
149
+ if torch.cuda.is_available():
150
+ torch.cuda.empty_cache()
151
+ torch.cuda.synchronize()
152
+ except ImportError:
153
+ pass
154
+
155
+ print("Pipeline reset completed")
156
+
157
+ def get_flux_pipeline():
158
+ """Get or initialize Flux pipeline"""
159
+ global _flux_pipeline
160
+
161
+ if _flux_pipeline is None:
162
+ import torch
163
+ from diffusers import FluxPipeline, FluxTransformer2DModel, GGUFQuantizationConfig
164
+ from transformers import T5EncoderModel, BitsAndBytesConfig
165
+
166
+ print("Loading Flux pipeline...")
167
+ dtype = torch.bfloat16
168
+
169
+ file_url = "https://huggingface.co/gokaygokay/flux-game/blob/main/hyperflux_00001_.q8_0.gguf"
170
+ single_file_base_model = "camenduru/FLUX.1-dev-diffusers"
171
+
172
+ quantization_config = BitsAndBytesConfig(
173
+ load_in_8bit=True, bnb_8bit_compute_dtype=torch.bfloat16
174
+ )
175
+ text_encoder_2 = T5EncoderModel.from_pretrained(
176
+ single_file_base_model,
177
+ subfolder="text_encoder_2",
178
+ torch_dtype=dtype,
179
+ quantization_config=quantization_config,
180
+ cache_dir="/models/hf_cache",
181
+ )
182
+
183
+ # Cache the GGUF file locally first
184
+ gguf_cache_path = "/models/gguf_cache/hyperflux_00001_.q8_0.gguf"
185
+ os.makedirs("/models/gguf_cache", exist_ok=True)
186
+
187
+ # Only download if not already cached
188
+ if not os.path.exists(gguf_cache_path):
189
+ print("Downloading GGUF file to cache...")
190
+ from huggingface_hub import hf_hub_download
191
+ downloaded_path = hf_hub_download(
192
+ repo_id="gokaygokay/flux-game",
193
+ filename="hyperflux_00001_.q8_0.gguf",
194
+ cache_dir="/models/hf_cache"
195
+ )
196
+ # Copy to our persistent location
197
+ import shutil
198
+ shutil.copy2(downloaded_path, gguf_cache_path)
199
+ print(f"GGUF file cached at {gguf_cache_path}")
200
+ else:
201
+ print(f"Using cached GGUF file at {gguf_cache_path}")
202
+
203
+ transformer = FluxTransformer2DModel.from_single_file(
204
+ gguf_cache_path,
205
+ subfolder="transformer",
206
+ quantization_config=GGUFQuantizationConfig(compute_dtype=dtype),
207
+ torch_dtype=dtype,
208
+ config=single_file_base_model,
209
+ )
210
+
211
+ _flux_pipeline = FluxPipeline.from_pretrained(
212
+ single_file_base_model,
213
+ transformer=transformer,
214
+ text_encoder_2=text_encoder_2,
215
+ torch_dtype=dtype,
216
+ )
217
+ _flux_pipeline.to("cuda")
218
+ print("Flux pipeline loaded")
219
+
220
+ return _flux_pipeline
221
+
222
+ def get_trellis_pipeline(trellis_model_name):
223
+ """Get or initialize TRELLIS pipeline"""
224
+ global _trellis_pipeline
225
+
226
+ if _trellis_pipeline is None:
227
+ print(f"Loading TRELLIS pipeline: {trellis_model_name}")
228
+ from trellis.pipelines import TrellisImageTo3DPipeline
229
+
230
+ _trellis_pipeline = TrellisImageTo3DPipeline.from_pretrained(trellis_model_name)
231
+ _trellis_pipeline.cuda()
232
+ print("TRELLIS pipeline loaded")
233
+
234
+ return _trellis_pipeline
235
+
236
+ @app.function(
237
+ gpu="A100",
238
+ volumes={
239
+ "/models": models_volume,
240
+ "/cache/torch_hub": torch_hub_volume,
241
+ "/cache/rembg": rembg_volume,
242
+ },
243
+ secrets=[modal.Secret.from_name("huggingface")],
244
+ scaledown_window=300,
245
+ timeout=3600,
246
+ memory=32768,
247
+ enable_memory_snapshot=True,
248
+ )
249
+ def text_to_3d(
250
+ text_prompt: str,
251
+ trellis_model_name: str = "JeffreyXiang/TRELLIS-image-large",
252
+ seed: int = 1,
253
+ image_width: int = 1024,
254
+ image_height: int = 1024,
255
+ guidance_scale: float = 3.5,
256
+ num_inference_steps: int = 8,
257
+ ss_guidance_strength: float = 7.5,
258
+ ss_sampling_steps: int = 12,
259
+ slat_guidance_strength: float = 3.0,
260
+ slat_sampling_steps: int = 12,
261
+ mesh_simplify: float = 0.95,
262
+ texture_size: int = 1024,
263
+ ) -> dict:
264
+ """Generate 3D assets from text prompts using FLUX + TRELLIS pipeline."""
265
+
266
+ import torch
267
+ import numpy as np
268
+ from PIL import Image
269
+ from trellis.utils import postprocessing_utils
270
+ from io import BytesIO
271
+
272
+ # Set environment variables
273
+ os.environ.update({
274
+ "SPCONV_ALGO": "native",
275
+ "ATTN_BACKEND": "flash-attn",
276
+ "TORCH_CUDA_ARCH_LIST": "8.0",
277
+ "HF_HOME": "/models/hf_cache",
278
+ "HF_DATASETS_CACHE": "/models/hf_cache",
279
+ "HF_HUB_CACHE": "/models/hf_cache",
280
+ "TORCH_HOME": "/cache/torch_hub",
281
+ "U2NET_HOME": "/cache/rembg",
282
+ })
283
+
284
+ # Create cache directories
285
+ for path in ["/models/hf_cache", "/cache/torch_hub", "/cache/rembg"]:
286
+ os.makedirs(path, exist_ok=True)
287
+
288
+ print("Starting pipeline initialization...")
289
+
290
+ # Check if we need periodic reset
291
+ periodic_pipeline_reset()
292
+
293
+ # Memory tracking
294
+ initial_memory = torch.cuda.memory_allocated() if torch.cuda.is_available() else 0
295
+ print(f"Initial GPU memory: {initial_memory / 1e9:.2f}GB")
296
+
297
+ try:
298
+ # Get pipelines (will load if not cached)
299
+ flux_pipeline = get_flux_pipeline()
300
+ trellis_pipeline = get_trellis_pipeline(trellis_model_name)
301
+
302
+ # Generate image with explicit memory management
303
+ print(f"Generating image from prompt: '{text_prompt}'")
304
+ device = "cuda"
305
+
306
+ with torch.no_grad():
307
+ generator = torch.Generator(device=device).manual_seed(seed)
308
+
309
+ generated_image = flux_pipeline(
310
+ prompt=text_prompt,
311
+ guidance_scale=guidance_scale,
312
+ num_inference_steps=num_inference_steps,
313
+ width=image_width,
314
+ height=image_height,
315
+ generator=generator,
316
+ ).images[0]
317
+
318
+ # Clear generator and intermediate tensors
319
+ del generator
320
+ cleanup_memory()
321
+
322
+ print("Image generation completed successfully")
323
+
324
+ # Preprocess image for TRELLIS
325
+ print("Preprocessing image for 3D generation...")
326
+ with torch.no_grad():
327
+ processed_image = trellis_pipeline.preprocess_image(generated_image)
328
+
329
+ cleanup_memory() # Clear rembg intermediate tensors
330
+ print("Image preprocessing completed")
331
+
332
+ # Generate 3D from image
333
+ print("Generating 3D asset from image...")
334
+ outputs = trellis_pipeline.run(
335
+ processed_image,
336
+ seed=seed,
337
+ formats=["gaussian", "mesh"],
338
+ preprocess_image=False,
339
+ sparse_structure_sampler_params={
340
+ "steps": ss_sampling_steps,
341
+ "cfg_strength": ss_guidance_strength,
342
+ },
343
+ slat_sampler_params={
344
+ "steps": slat_sampling_steps,
345
+ "cfg_strength": slat_guidance_strength,
346
+ },
347
+ )
348
+
349
+ cleanup_memory() # Clear 3D generation intermediate tensors
350
+ print("3D generation completed successfully")
351
+
352
+ # Prepare result
353
+ result = {
354
+ "text_prompt": text_prompt,
355
+ "seed": seed,
356
+ "trellis_model_name": trellis_model_name,
357
+ "image_generation_params": {
358
+ "width": image_width,
359
+ "height": image_height,
360
+ "guidance_scale": guidance_scale,
361
+ "num_inference_steps": num_inference_steps,
362
+ },
363
+ "3d_generation_params": {
364
+ "ss_guidance_strength": ss_guidance_strength,
365
+ "ss_sampling_steps": ss_sampling_steps,
366
+ "slat_guidance_strength": slat_guidance_strength,
367
+ "slat_sampling_steps": slat_sampling_steps,
368
+ },
369
+ }
370
+
371
+ # Save generated image
372
+ img_buffer = BytesIO()
373
+ generated_image.save(img_buffer, format="PNG")
374
+ result["generated_image"] = img_buffer.getvalue()
375
+
376
+ # Generate GLB file
377
+ if outputs.get("gaussian") and outputs.get("mesh"):
378
+ print("Generating GLB file...")
379
+ glb = postprocessing_utils.to_glb(
380
+ outputs["gaussian"][0],
381
+ outputs["mesh"][0],
382
+ simplify=mesh_simplify,
383
+ texture_size=texture_size,
384
+ )
385
+ result["glb_file"] = glb.export(file_type="glb")
386
+ print("GLB generation completed successfully")
387
+ else:
388
+ print("Warning: Both gaussian and mesh outputs required for GLB generation")
389
+
390
+ # Final cleanup
391
+ cleanup_memory()
392
+
393
+ final_memory = torch.cuda.memory_allocated() if torch.cuda.is_available() else 0
394
+ print(f"Final GPU memory: {final_memory / 1e9:.2f}GB")
395
+ print(f"Memory delta: {(final_memory - initial_memory) / 1e6:.1f}MB")
396
+
397
+ return result
398
+
399
+ except Exception as e:
400
+ print(f"Error during generation: {e}")
401
+ cleanup_memory()
402
+ raise
403
+
404
+ @app.local_entrypoint()
405
+ def main(
406
+ text_prompt: str = "A isometric 3D dragon with two heads, white background",
407
+ trellis_model_name: str = "JeffreyXiang/TRELLIS-image-large",
408
+ seed: int = 1,
409
+ ):
410
+ """Local entrypoint for testing the text-to-image-to-3D generation."""
411
+ print(f"Starting text-to-image-to-3D generation...")
412
+ print(f"Prompt: {text_prompt}")
413
+ print(f"TRELLIS Model: {trellis_model_name}")
414
+ print(f"Seed: {seed}")
415
+
416
+ result = text_to_3d.remote(
417
+ text_prompt=text_prompt,
418
+ trellis_model_name=trellis_model_name,
419
+ seed=seed,
420
+ )
421
+
422
+ print(f"Generation completed!")
423
+ print(f"Result keys: {list(result.keys())}")
424
+
425
+ import os
426
+ output_dir = "modal_outputs"
427
+ os.makedirs(output_dir, exist_ok=True)
428
+
429
+ if "generated_image" in result:
430
+ with open(os.path.join(output_dir, "generated_image.png"), "wb") as f:
431
+ f.write(result["generated_image"])
432
+ print(f"Saved: {output_dir}/generated_image.png")
433
+
434
+ if "glb_file" in result:
435
+ with open(os.path.join(output_dir, "model.glb"), "wb") as f:
436
+ f.write(result["glb_file"])
437
+ print(f"Saved: {output_dir}/model.glb")
438
+
439
+ return result
440
+
441
+ if __name__ == "__main__":
442
+ import sys
443
+ prompt = sys.argv[1] if len(sys.argv) > 1 else "A isometric 3D dragon with two heads, white background"
444
+ model = sys.argv[2] if len(sys.argv) > 2 else "JeffreyXiang/TRELLIS-image-large"
445
+ seed = int(sys.argv[3]) if len(sys.argv) > 3 else 1
446
+
447
+ with app.run():
448
+ main(prompt, model, seed)