dtype: float32 in base model vs. dtype: bfloat16 in the instruction fine-tuned model
In this base model, the dtype is float32; however, in the instruction fine-tuned model, the dtype is (https://huggingface.co/google/gemma-2-9b-it/blob/main/config.json#L29).
Is this inconsistency intentional or a bug?
Hi Sorry for late response,
The difference you're observing is a common and intentional optimization strategy. The base model might be trained and saved in full precision (float32)
, while the instruction-tuned variant is likely optimized for efficient inference using lower precision (bfloat16 or float16)
to maximize performance and reduce memory footprint on target hardware.
When you load these models using the Hugging Face transformers library, it often handles these dtype considerations automatically to provide the best performance on your available hardware."
Thank you.