Considering a distilled version of 80B parameters
Hi Moonshot,
Do you ever consider to create a 78B MoE distilled version of Kimi K2?
You might ask why 78B, its pretty simple very good quantisation of Q4_0 brings this down to 40G, which means if someone has 2 x 20-24GB GPU he could use this model at home without any trouble in Q4 with imense speed. Everything above 80GB can not be run on two local GPU's....
The comunity has models in the 0-30B range for single GPU, but for dual GPU there is not a single model unfortunately...
120B from GPT-OSS dosent fit into two 20GB gpus, same as the GLM Air model which also dosent fit into two GPU's... If they both would be 78B parameters , then dual GPU users could profit from this in Q4 a lot.
So would you consider creating a flash version that is approx. 80B parameters?
Have u ever thought why GPT-OSS and GLM Air dosen't fit 2 x 24GB GPU. ^_^
Have u ever thought why GPT-OSS and GLM Air dosen't fit 2 x 24GB GPU. ^_^
why why tell me why
Have u ever thought why GPT-OSS and GLM Air dosen't fit 2 x 24GB GPU. ^_^
would love to know why? ... for nvidia selling more big memory GPU's for 10k usd? Because it cant be the because new Ryzen AI PC's or nvidia sparks.... both of them dont have the compute.... thats why 2 GPU's would be much more interesting, but your opinnion realy would be interesting....
Have u ever thought why GPT-OSS and GLM Air dosen't fit 2 x 24GB GPU. ^_^
why why tell me why
because moe model <100b is not powerful enough?
Have u ever thought why GPT-OSS and GLM Air dosen't fit 2 x 24GB GPU. ^_^
why why tell me why
because moe model <100b is not powerful enough?
wrong, layer depth (how many layers) * (active parameters) == intelligence....
Additional parameters in a MoE are just retrival of knowledge, thats why they need more. But training ultra high layer count llm's is extremely difficult.
WIDTH (dimension) is what most do, because much cheaper to do training on....
LAYER's is what they should do....