Delay time and execution time considerably larger compared to 3.0 model?
#8
by
rafa9
- opened
I deployed the 3.0 FailSpy ablierated model and this model which seem to be using same techniques and have identical parameters. I'm using same machine 2x80GB on runpod but the execution time and queue delay time has massive differences:
Queue delay:
Llama70B 3.0: 0.02 secs
Llama70B 3.1: 0.15 secs
Execution time:
Llama70B 3.0: 0.65-0.8 secs
Llama70B 3.1: 3-5 secs
Models:
Llama 70B 3.0: https://huggingface.co/failspy/Meta-Llama-3-70B-Instruct-abliterated-v3.5
Llama 70B 3.1: https://huggingface.co/mlabonne/Llama-3.1-70B-Instruct-lorablated
@mlabonne any advice plz?
You're right. The original Meta-Llama-3.1-70B-Instruct is also slower in queue and execution. Closing this.
rafa9
changed discussion status to
closed