smaller dense versions?
#8
by
TheBigBlockPC
- opened
are smaller distills planned of this model that are dense because of bitsandbytes quantisation not really working on MoE models. a 40b dense version would be useful for local deployment
if you look at videos on yt the model isnt even that good, for 80b seems quite bad
How does it compare to qwen-image