evals (PT vs IT)

#30
by erichartford - opened

Hello,

The evals in the model card are the "PT" version, but this is the "IT" version

image.png

Presumably the "IT" version will have better scores than the "PT" version right?

Do you have the scores for the "IT" version to publish here?

Maybe it's here, page23 of the report https://storage.googleapis.com/deepmind-media/gemma/Gemma3Report.pdf

however, it's hard for me to reproduce the scores (i.e., gsm8k, humaneval, mbpp) with lm-evaluation-harness, and I don't know where is the gap :(

image.png

Sign up or log in to comment