openai model 120b

by gopi87 - opened Aug 6

Discussion

gopi87

Aug 6

i just tested the open ai 120b its looking great on code generation

ubergarm

Owner Aug 6

Given it is 120B dense all the weights are "Active" so without full VRAM offload it will be slower. Probably a good quant for EXL3 if support lands there. I may look into some KT Trellis quants once support lands in ik_llama.cpp.

If you want to do hybrid CPU+GPU, it won't be nearly as fast as GLM-4.5 which has only ~32B active parameters.

We'll see how it goes!

gopi87

Aug 6

•

edited Aug 6

Given it is 120B dense all the weights are "Active" so without full VRAM offload it will be slower. Probably a good quant for EXL3 if support lands there. I may look into some KT Trellis quants once support lands in ik_llama.cpp.

If you want to do hybrid CPU+GPU, it won't be nearly as fast as GLM-4.5 which has only ~32B active parameters.

We'll see how it goes!

mate its 120b moe with 5b active

gopi87

Aug 6

and i was using in llama.cpp

ubergarm

Owner Aug 6

@gopi87

Each model is a Transformer which leverages mixture-of-experts (MoE[2]) to reduce the number of active parameters needed to process input. gpt-oss-120b activates 5.1B parameters per token, while gpt-oss-20b activates 3.6B. The models have 117b and 21b total parameters respectively.

Ahh so it is, lol, finally saw something about it, been so busy trying to get GLM-4.5/Air over the finish line lol

gopi87

Aug 6

yep and the ratio is really perfect + it did good in my vibe code check.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment