Do it for your MoE models? (Or do larger MoE models?)

by deltanym - opened

I'm glad a400m and a800m exist because they're very fast for their size due to being MoE
(IMO there's a big gap in ~8b size and medium size models where there's not really MoE options at all)
but either way, would be great to see this on the MoE models?

Sign up or log in to comment