144gb vram and 256gb ram
#12
by
fuutott
- opened
I'm trying to work out what's the best way for me to split the model to load as much as I can to rtx 6000 96gb and ada a6000 48 gb + 256 8ch ddr5
-ot ".ffn_(up)_exps.=CPU" ?
Sorry on the delay - if it helps, I wrote approximately on how to offload other layers in https://docs.unsloth.ai/basics/qwen3-coder#improving-generation-speed