Spaces:

huggingchat
/

chat-ui

Running

App Files Files Community

756

CPU or GPU Inference

#54

by eggie5 - opened Apr 26, 2023

Discussion

eggie5

Apr 26, 2023

Is this doing CPU inference?

benhachem

Apr 26, 2023

This comment has been hidden

coyotte508

Hugging Chat org Apr 26, 2023

The model is not running in the Space itself, the Space is just a webapp that proxies calls to the HF inference API

coyotte508

Hugging Chat org Apr 26, 2023

It's running in GPUs

coyotte508 changed discussion status to closed Apr 26, 2023

eggie5

Apr 26, 2023

any details on the inference setup?

eggie5 changed discussion status to open Apr 26, 2023

julien-c

Hugging Chat org Apr 27, 2023

g5 instances from aws currently

julien-c

Hugging Chat org Apr 27, 2023

•

edited Apr 27, 2023

running https://github.com/huggingface/text-generation-inference

eggie5

Apr 27, 2023

@julien-c cool glad to you hear you don't need A100s to get speed like that. Using the base model https://huggingface.co/OpenAssistant/oasst-sft-6-llama-30b-xor and not special quantization/distillation on top??

gsaivinay

May 2, 2023

•

edited May 2, 2023

g5 instances from aws currently

Hello @julien-c ,

could you please kindly share the parameters used for starting the server. I also happened to use g5 instance for inference, speed good, but not as good as this demo.

julien-c

Hugging Chat org May 4, 2023

Do you run https://github.com/huggingface/text-generation-inference ?

gsaivinay

May 4, 2023

yes, not docker but cli

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment