Multi-GPU / Parallel Processing Support
#3
by
Iotcv
- opened
We are trying to use this model on multiple GPUs, but noticed that it currently only utilizes a single GPU. This leads to out-of-memory (OOM) errors.
Any guidance on best practices for running this model across multiple GPUs would be very helpful.
Looking forward to exploring more with this model
Thanks
Thank you for your attention. You can try the scripts below to enable Multi-GPU / Parallel Processing:
...
import torch.distributed as dist
dist.init_process_group(backend="nccl")
rank = dist.get_rank()
...
pipeline = NextStepPipeline(tokenizer=tokenizer, model=model).to(device=f"cuda:{rank}")
...
image = pipeline.generate_image(
....
seed=42 + rank,
)[0]
image.save(f"./assets/output_{rank}.png")
then use torchrun
to start the inference:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --nproc-per-node=8 your_scripts.py