Can we use the same transformer.pipeline for multiple async calls to the model
#4
by
awppatel
- opened
Hi,
once I create a pipe instance for the Ultravox Engine as below
pipe = transformers.pipeline(
model='fixie-ai/ultravox-v0_5-llama-3_2-1b',
trust_remote_code=True,
trust_repo = True
)
Can the same pipe be used for multiple simultaneous asynchronous interactions. For example if we have 5 interactions with five different endpoints, can we use the same pipe to extract information from the Ultravox engine asynchronously or do we need to create an instance of the pipe for each interaction.
Also, initiating the pipe takes quite a bit of time almost 7-10 seconds. We are currently using the following configuration
OS : Ubuntu Server 22.04
GPU : 1x RTX A6000 (48GB) [Premium]
CPU : 6 vCPU, 96 GB RAM, 300 GB Storage
Is there best config to have the model load faster
Thanks.
Arshad.