fixie-ai/ultravox-v0_5-llama-3_2-1b · Can we use the same transformer.pipeline for multiple async calls to the model

Hi,

once I create a pipe instance for the Ultravox Engine as below

pipe = transformers.pipeline(
    model='fixie-ai/ultravox-v0_5-llama-3_2-1b',
    trust_remote_code=True,
    trust_repo = True
)

Can the same pipe be used for multiple simultaneous asynchronous interactions. For example if we have 5 interactions with five different endpoints, can we use the same pipe to extract information from the Ultravox engine asynchronously or do we need to create an instance of the pipe for each interaction.

Also, initiating the pipe takes quite a bit of time almost 7-10 seconds. We are currently using the following configuration

OS : Ubuntu Server 22.04
GPU : 1x RTX A6000 (48GB) [Premium]
CPU : 6 vCPU, 96 GB RAM, 300 GB Storage

Is there best config to have the model load faster

Thanks.

Arshad.