Can this be ran using Transformers?

#2
by FremyCompany - opened

If yes, does it requires this file, or a fork?
I was also wondering why the total weight of the model seems to be 400Gb, shouldn't a FP4 DeepSeek be around 300Gb? Is this entirely due to the self-attention layers not being quantized?

Sign up or log in to comment