Can this be ran using Transformers?

by FremyCompany - opened 21 days ago

21 days ago

If yes, does it requires this file, or a fork?
I was also wondering why the total weight of the model seems to be 400Gb, shouldn't a FP4 DeepSeek be around 300Gb? Is this entirely due to the self-attention layers not being quantized?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment