Using 128K as Context Window

#2
by mdobbali - opened

Hi @xingyaoww

Curious about the serving engine you used. Did you use vLLM to serve? If yes, were you able to succeed? Could you please share that is the machine that you used and serve setting? I am running into OOM issues with g5 12x Large machine and using Qwen2.5 7B with context window 131k. Only one GPU is loaded with the model even when I am using tensor parallelism

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment