The performnan on humaneval.
#5
by
TingchenFu
- opened
Hello, I evaluate the model on humaneval benchmark with the bigcode-evaluation-harness framework and the pass@1 on humaneval is much lower than the number reported in the original paper (3% v.s. 18%). However, in my implementation, the pass@1 on MBPP agrees with the reported number. Does anyone have any idea why this happened?