Using 128K as Context Window

by mdobbali - opened Mar 6

Mar 6

Curious about the serving engine you used. Did you use vLLM to serve? If yes, were you able to succeed? Could you please share that is the machine that you used and serve setting? I am running into OOM issues with g5 12x Large machine and using Qwen2.5 7B with context window 131k. Only one GPU is loaded with the model even when I am using tensor parallelism

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment