openai model 120b

#2
by gopi87 - opened

i just tested the open ai 120b its looking great on code generation

Given it is 120B dense all the weights are "Active" so without full VRAM offload it will be slower. Probably a good quant for EXL3 if support lands there. I may look into some KT Trellis quants once support lands in ik_llama.cpp.

If you want to do hybrid CPU+GPU, it won't be nearly as fast as GLM-4.5 which has only ~32B active parameters.

We'll see how it goes!

Given it is 120B dense all the weights are "Active" so without full VRAM offload it will be slower. Probably a good quant for EXL3 if support lands there. I may look into some KT Trellis quants once support lands in ik_llama.cpp.

If you want to do hybrid CPU+GPU, it won't be nearly as fast as GLM-4.5 which has only ~32B active parameters.

We'll see how it goes!

mate its 120b moe with 5b active

and i was using in llama.cpp

@gopi87

Each model is a Transformer which leverages mixture-of-experts (MoE[2]) to reduce the number of active parameters needed to process input. gpt-oss-120b activates 5.1B parameters per token, while gpt-oss-20b activates 3.6B. The models have 117b and 21b total parameters respectively.

Ahh so it is, lol, finally saw something about it, been so busy trying to get GLM-4.5/Air over the finish line lol

yep and the ratio is really perfect + it did good in my vibe code check.

Sign up or log in to comment