GPU for inference
#3
by
vt404v2
- opened
Chat with h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v2 here https://gpt-gm.h2o.ai/ looks very fast. Can you please tell me what GPU you are using for inference? I get about 6.5 tokens/s with 500 tokens prompt and 32 new tokens on A100 80Gb.
We are hosting the model on a A100 80GB using the awesome inference repository from Hugging Face https://github.com/huggingface/text-generation-inference.
Actually, the GPU is even shared with the other 7B model.
Thanks, it works for me
vt404v2
changed discussion status to
closed