When evaluating Wiki2, I just get Loss: Nan, while with gemma-3-1b-it it works..
Why doesn't it work for the -pt version? Can someone help?
model = AutoModelForCausalLM.from_pretrained(
args.path,
torch_dtype=getattr(torch, args.torch_dtype.split('.')[-1]),
trust_remote_code=True,
).to("cuda" if torch.cuda.is_available() else "cpu").eval()
with torch.no_grad():
outputs = model(input_ids, labels=target_ids)
loss = outputs.loss
if torch.isnan(loss):
print(f"NaN loss at i={i}, begin={begin}, end={end}")
continue
Hi @jonny-vr ,
Welcome to the Google Gemma family of open-source models. The primary distinction between the pre-trained (pt) and instruction-tuned (it) models lies in their training objectives. Pre-trained models are trained on general information from sources such as Wikipedia and books, etc, whereas instruction-tuned models undergo further training specifically to adhere to instructions.
I have executed both the pre-trained and instruction-tuned models locally and evaluated their loss values. Both the models are producing the numeric loss values. Please find the attached gist file for you reference. I have tested with normal sample ids available in the sample example code.
Key points to consider:
- Kindly verify that the parameters and arguments, particularly the data type, being passed to the model are correct. The use of unsupported data types can lead to incorrect loading of model weights, resulting in erroneous outputs.
- The issue may stem from the input and label IDs provided to the model.
If you required any further help reach out to me, I'm more than happy to help you out.
Thanks.