Reproducing accuracy results
#3
by
dtrawins
- opened
What is the recommended method for reproducing the accuracy results presented in the model card?
While running the test via https://github.com/EvolvingLMMs-Lab/lmms-eval the results are slightly lower:
lmms-eval --model internvl2 --model_args=pretrained=OpenGVLab/InternVL2_5-4B --tasks mmmu_val
| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
|--------|------:|------|-----:|--------|---|-----:|---|------|
|mmmu_val| 0|none | 0|mmmu_acc|↑ |0.4911|± | N/A|
That is lower than expected 52.3. Was there used different dataset. There is similar ~3% difference for 8B and 1B models as well. Is there some extra tunning needed?