Delay time and execution time considerably larger compared to 3.0 model?

#8
by rafa9 - opened

I deployed the 3.0 FailSpy ablierated model and this model which seem to be using same techniques and have identical parameters. I'm using same machine 2x80GB on runpod but the execution time and queue delay time has massive differences:
Queue delay:
Llama70B 3.0: 0.02 secs
Llama70B 3.1: 0.15 secs

Execution time:
Llama70B 3.0: 0.65-0.8 secs
Llama70B 3.1: 3-5 secs

Models:
Llama 70B 3.0: https://huggingface.co/failspy/Meta-Llama-3-70B-Instruct-abliterated-v3.5
Llama 70B 3.1: https://huggingface.co/mlabonne/Llama-3.1-70B-Instruct-lorablated

@mlabonne any advice plz?

Hey @rafa9 , sorry I have no idea. You shouldn't see any difference in theory. Have you tried running the official Meta-Llama-3.1-70B-Instruct?

You're right. The original Meta-Llama-3.1-70B-Instruct is also slower in queue and execution. Closing this.

rafa9 changed discussion status to closed

Sign up or log in to comment