wan 2.1 14b i2v 720p q6_k quantized gguf

#6
by clinicallyautomated - opened

hey guys i saw a youtube tutorial that claimed to be able to run wan 2.1 14b i2v 720p q6_k quantized gguf model on 8gb of vram successfully so i thought i would try it.

I downloaded the city96 gguf model here: https://huggingface.co/city96/Wan2.1-I2V-14B-720P-gguf?show_file_info=wan2.1-i2v-14b-720p-Q6_K.gguf

i used this workflow: https://tensor.art/workflows/83641138...

Well it worked (which i was shocked), but the output was very bad. Super glitchy/weird stuff happens.

I tried all sorts of different configurations steps 25-50, cfg 4-10, denoise 0.4-1.0, euler/uni_pc/dpmpp_2m sampler_names and simple/normal schedulers, i was able to get results without crashing my computer..but the outputs werejust so bad, i couldn't get even 1 decent output after a lot of experimentation. Any tips to get better outputs? Should I try the calcuis gguf models instead of city96? Lower quantizaiton? 480p instead of 720p?

Thanks in advance guys!

Can you post a full screenshot of the workflow? The above link you posted seems to be cut off.

There's a few things that could cause issues like that, but it's hard to troubleshoot without extra info.

@city96 are the dimensions of GGUF really accurate?

I am getting mismatch errors

  While copying the parameter named "blocks.20.ffn.2.weight", whose dimensions in the model are torch.Size([5120, 13824]) and whose dimensions in the checkpoint are torch.Size([5120, 14688]), an exception occurred : ('only Tensors of floating point dtype can require gradients',).

official model

{
"has_image_input": true,
"patch_size": [1, 2, 2],
"in_dim": 36,
"dim": 5120,
"ffn_dim": 13824,
"freq_dim": 256,
"text_dim": 4096,
"out_dim": 16,
"num_heads": 40,
"num_layers": 40,
"eps": 1e-6
}

@MonsterMMORPG
I assume you are looking at the shape of the quantized data or attempting to load the quantized data into a regular nn.linear layer.
The dimensions for the gguf files are indeed correct, which you can verify on the huggingface metadata viewer:
image.png

@MonsterMMORPG
I assume you are looking at the shape of the quantized data or attempting to load the quantized data into a regular nn.linear layer.
The dimensions for the gguf files are indeed correct, which you can verify on the huggingface metadata viewer:
image.png

ok in that case that repo implementation is inaccurate ty

i used implementation from here : https://github.com/modelscope/DiffSynth-Engine/blob/bc3824d027526779c72f04ec9b7bd39f861eac2b/diffsynth_engine/utils/gguf.py#L9

I haven't heard about that specific backend before, but it looks like they have added support here: https://github.com/modelscope/DiffSynth-Engine/pull/21
(including test cases that use the models from the T2V version of this repo)

I assume you've updated to the latest version already? You could also try the models defined as the test case(s) in the PR above, as a sanity check:
image.png

Hey @city96 ive managed to get the new ltx13b and quants to work, its just a workaround so far, but im uploading the ggufs already (;
https://huggingface.co/wsbagnsv1/ltxv-13b-0.9.7-dev-GGUF

Sign up or log in to comment