Reproducing accuracy results

#3
by dtrawins - opened

What is the recommended method for reproducing the accuracy results presented in the model card?
While running the test via https://github.com/EvolvingLMMs-Lab/lmms-eval the results are slightly lower:

lmms-eval --model internvl2 --model_args=pretrained=OpenGVLab/InternVL2_5-4B  --tasks mmmu_val
| Tasks  |Version|Filter|n-shot| Metric |   |Value |   |Stderr|
|--------|------:|------|-----:|--------|---|-----:|---|------|
|mmmu_val|      0|none  |     0|mmmu_acc|↑  |0.4911|±  |   N/A|

That is lower than expected 52.3. Was there used different dataset. There is similar ~3% difference for 8B and 1B models as well. Is there some extra tunning needed?

Sign up or log in to comment