float32 vs bf16

#5
by janimo - opened

Why the difference between this and the -it model dtypes?

Full precision is usually useful for pre-training. For inference, using bfloat16 should be good :)

Other models (even regular gemma) have bf16 both for base and it models, hence my question about what is the rationale here. Is f32 needed for proper recurrent gemma fine-tuning?

Google org

f32 is not needed to do fine-tuning, either f32 or bf16 will be possible to do fine-tuning with.

Google org

Just to add to the comment of @AnushanF , in the code we always do the recurrence (RG-LRU layer) in f32 even when fine-tuning the overall model with bf16, as we found this to work much better.

Google org

Hi @janimo ,@sohamde, Hope this issue is resolved. Please close the issue and feel free to reopen if any further issue arise. Thank you.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment