mlabonne/Llama-3.1-70B-Instruct-lorablated

Aug 4, 2024

•

edited Aug 4, 2024

First thank you for creating this lorablated.

I'm using TabbyAPI with a 6bit and 8bit head weight on this 70B. Using either the tokenizer_config.json chat_template or a custom generated one, I'm seeing endless generation with assistant periodically being generated between what looks like complete LLM responses.

Adding the stop word assistant without any spaces seems to be stopping the endless generation. Is this the expected fix or do you have any thoughts on how or why this may be happening?

mlabonne

Owner Aug 4, 2024

I've added a generation_config.json, that might fix your problem if TabbyAPI relies on it. If you still see this issue, could you try the non-lorablated L3.1 70B and tell me if it works for you?

veden

Aug 4, 2024

The generation_config.json seems to have fixed the issue. Thank you for the assistance.

The original L3.1 70B works without issue.

mlabonne

Owner Aug 4, 2024

Excellent, thanks for your feedback!

veden changed discussion status to closed Aug 7, 2024

mlabonne
/

Llama-3.1-70B-Instruct-lorablated

Endless generation