Using 128K as Context Window
#2
by
mdobbali
- opened
Hi @xingyaoww
Curious about the serving engine you used. Did you use vLLM to serve? If yes, were you able to succeed? Could you please share that is the machine that you used and serve setting? I am running into OOM issues with g5 12x Large machine and using Qwen2.5 7B with context window 131k. Only one GPU is loaded with the model even when I am using tensor parallelism