Out of memory error

#1
by jeffling-google - opened

Hi Adam,

I got the following error when running the sample script to generate the image - I have 40GB GPU memory. Any idea?

traceback (most recent call last):
File "/home/jeffling/ai-toolkit/adam_run.py", line 8, in
pipeline.to("cuda")
File "/opt/conda/lib/python3.10/site-packages/diffusers/pipelines/pipeline_utils.py", line 460, in to
module.to(device, dtype)
File "/opt/conda/lib/python3.10/site-packages/diffusers/models/modeling_utils.py", line 1060, in to
return super().to(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1340, in to
return self._apply(convert)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 900, in _apply
module._apply(fn)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 900, in _apply
module._apply(fn)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 900, in _apply
module._apply(fn)
[Previous line repeated 3 more times]
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 927, in _apply
param_applied = fn(param)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1326, in convert
return t.to(
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 144.00 MiB. GPU 0 has a total capacity of 39.38 GiB of which 128.25 MiB is free. Including non-PyTorch memory, this process has 39.24 GiB memory in use. Of the allocated memory 38.82 GiB is allocated by PyTorch, and 20.23 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Hey Jeff

Loading the full base Flux model via diffusers requires ~70gb of VRAM, so you'd want to use something like an H100 for full precision.

If that's not an option I would recommend accessing via ComfyUI with the workflow I have attached in the model card and files, and you'd want to use quantized versions of T5, CLIP, and the main Flux transformer. The ComfyUI setup can run in low VRAM environments of ~24GB

See some of these resources
https://huggingface.co/Kijai/flux-fp8/discussions/7
https://huggingface.co/comfyanonymous/flux_text_encoders/blob/main/t5xxl_fp8_e4m3fn.safetensors
https://huggingface.co/city96/FLUX.1-dev-gguf

Sign up or log in to comment