Spaces:
Running
Running
CPU or GPU Inference
#54
by
eggie5
- opened
Is this doing CPU inference?
This comment has been hidden
The model is not running in the Space itself, the Space is just a webapp that proxies calls to the HF inference API
It's running in GPUs
coyotte508
changed discussion status to
closed
any details on the inference setup?
eggie5
changed discussion status to
open
g5 instances from aws currently
@julien-c cool glad to you hear you don't need A100s to get speed like that. Using the base model https://huggingface.co/OpenAssistant/oasst-sft-6-llama-30b-xor and not special quantization/distillation on top??
g5 instances from aws currently
Hello @julien-c ,
could you please kindly share the parameters used for starting the server. I also happened to use g5 instance for inference, speed good, but not as good as this demo.
yes, not docker but cli