Added VLLM Offline Serve working code.
#107
by
hrithiksagar-tih
- opened
So, in this commit, I have attached the solution to the OSS 20b model inference code via vLLM. The original code in the cookbook: https://cookbook.openai.com/articles/gpt-oss/run-vllm, was not working; with a few modifications, it worked.
What GPU are you using? Ampere, Ada Lovelace?
I used H100s
Thank you @hrithiksagar-tih ! Can you actually instead PR into github.com/openai/gpt-oss and I'll copy back into both model cards? Thanks
dkundel-openai
changed pull request status to
closed
Yes I will do it
Thanks!