Load it in to 16-bit quantization(float16 or bfloat16)
Hi,
Thanks for the support and sharing this repo, I want to load the models in the float16 or bfloat16, but still even though I have ram of 46GB still I am facing memory issues while loading the models itself.
Below are the things I tried it on AWS g6e.2xlarge(https://instances.vantage.sh/aws/ec2/g6e.2xlarge)
tried quantization with 16bit --> OOM error
tried with bfloat16 quantization --> mismatch error with prepare_latents method. float32 dtype
tried with bfloat16 quantization & update the dtype in prepare_latents method --> while generating getting the OOM error.
tried with CPU offload --> even then OOM eeror
when loaded with bfloat16 it occupied 44221Mib/46068Mib on NVIDIA L40S
Can you help me here, how to further proceed. or do I need to increase the computation power. Pls share the required details.
Thanks in advance,
Zeeshan