Skywork/SkyReels-A2 · Load it in to 16-bit quantization(float16 or bfloat16)

Hi,

Thanks for the support and sharing this repo, I want to load the models in the float16 or bfloat16, but still even though I have ram of 46GB still I am facing memory issues while loading the models itself.

Below are the things I tried it on AWS g6e.2xlarge(https://instances.vantage.sh/aws/ec2/g6e.2xlarge)

tried quantization with 16bit --> OOM error
tried with bfloat16 quantization --> mismatch error with prepare_latents method. float32 dtype
tried with bfloat16 quantization & update the dtype in prepare_latents method --> while generating getting the OOM error.
tried with CPU offload --> even then OOM eeror
when loaded with bfloat16 it occupied 44221Mib/46068Mib on NVIDIA L40S

Can you help me here, how to further proceed. or do I need to increase the computation power. Pls share the required details.

Thanks in advance,
Zeeshan