smaller dense versions?

#8
by TheBigBlockPC - opened

are smaller distills planned of this model that are dense because of bitsandbytes quantisation not really working on MoE models. a 40b dense version would be useful for local deployment

if you look at videos on yt the model isnt even that good, for 80b seems quite bad

How does it compare to qwen-image

Sign up or log in to comment