144gb vram and 256gb ram

#12
by fuutott - opened

I'm trying to work out what's the best way for me to split the model to load as much as I can to rtx 6000 96gb and ada a6000 48 gb + 256 8ch ddr5
-ot ".ffn_(up)_exps.=CPU" ?

Unsloth AI org

Sorry on the delay - if it helps, I wrote approximately on how to offload other layers in https://docs.unsloth.ai/basics/qwen3-coder#improving-generation-speed

Sign up or log in to comment