mlabonne/Llama-3.1-70B-Instruct-lorablated · Delay time and execution time considerably larger compared to 3.0 model?

Sep 20, 2024

•

edited Sep 20, 2024

I deployed the 3.0 FailSpy ablierated model and this model which seem to be using same techniques and have identical parameters. I'm using same machine 2x80GB on runpod but the execution time and queue delay time has massive differences:
Queue delay:
Llama70B 3.0: 0.02 secs
Llama70B 3.1: 0.15 secs

Execution time:
Llama70B 3.0: 0.65-0.8 secs
Llama70B 3.1: 3-5 secs

Models:
Llama 70B 3.0: https://huggingface.co/failspy/Meta-Llama-3-70B-Instruct-abliterated-v3.5
Llama 70B 3.1: https://huggingface.co/mlabonne/Llama-3.1-70B-Instruct-lorablated

@mlabonne any advice plz?

mlabonne

Owner Sep 20, 2024

Hey @rafa9 , sorry I have no idea. You shouldn't see any difference in theory. Have you tried running the official Meta-Llama-3.1-70B-Instruct?

rafa9

Sep 21, 2024

You're right. The original Meta-Llama-3.1-70B-Instruct is also slower in queue and execution. Closing this.

rafa9 changed discussion status to closed Sep 21, 2024