fp8_e5m2 version

#1
by phazei - opened

For those of us with 3090's who want to use torch compile, would you mind providing one with e5m2 please?

oic, that node doesn't support it :/

And that node just converts every layer, it isn't specific like Kijai's nodes.

I wrote a script to convert some of the other Wan2.1 models, SkyReel, into fp8_e5m2 that Kijai didn't support. It would be pretty easy to do the same thing with this model I think, probably the same script would work since it's a fine tune of Wan. It specifically keeps the layers that Kijai left as fp32 and converts the rest. But I'd have to download the original 60gb model, which is... a lot for my connection. So if you still have it, this script could easily also be adapted to fp8_e4m3 as well.

Also, did you just convert from comfyui's repackaged fp16 node and not the source fp32? Because that'll also be a small drop in quality I think.

oic, that node doesn't support it :/

And that node just converts every layer, it isn't specific like Kijai's nodes.

I wrote a script to convert some of the other Wan2.1 models, SkyReel, into fp8_e5m2 that Kijai didn't support. It would be pretty easy to do the same thing with this model I think, probably the same script would work since it's a fine tune of Wan. It specifically keeps the layers that Kijai left as fp32 and converts the rest. But I'd have to download the original 60gb model, which is... a lot for my connection. So if you still have it, this script could easily also be adapted to fp8_e4m3 as well.

Also, did you just convert from comfyui's repackaged fp16 node and not the source fp32? Because that'll also be a small drop in quality I think.

I converted it from the repackaged fp16, but i'll happily download and convert from the originals using your scripts, if my PC can handle it.

I uploaded the e5m2 version, dunno if I did it right. It is 1Gb bigger.

All I changed in the scripts was the naming format for the shards/json ( it was different than what it is named in Wan's repository ) and changed the output location

Oh good, you found my scripts.
I could have sword I posted a link but I don't see it, whoops.
For any reference in the future:
https://huggingface.co/phazei/phazei-SkyReels-V2-fp8-e5m2/tree/main/scripts

It would make sense if it's a little bigger because it leaves some of the layers at fp32 like Kijai did, they were the smaller layers to begin with. I have a 3090 and it got through it no problem. I made sure the scripts would only load the shards and layers into memory one at a time so it's not very memory intensive. The only thing that takes a lot of memory is combining all the shards, but that's RAM only, no need for VRAM there, so most systems can handle it.

Thanks for trying it out and uploading it! My ISP sent me a text saying I've nearly reached my download cap for this month, AI models take so much bandwidth, heh.

Oh, and if you want to see the difference, it's easy to change the script to do e4m3 with this one line change:

TARGET_FP8_DTYPE = torch.float8_e4m3fn

Oh, and if you want to see the difference, it's easy to change the script to do e4m3 with this one line change:

TARGET_FP8_DTYPE = torch.float8_e4m3fn

Yep, already did :)

Although, I don't know if it is better or not, it just gives slightly different results, both good.

Sign up or log in to comment