The performnan on humaneval.

by TingchenFu - opened Feb 12

Feb 12

Hello, I evaluate the model on humaneval benchmark with the bigcode-evaluation-harness framework and the pass@1 on humaneval is much lower than the number reported in the original paper (3% v.s. 18%). However, in my implementation, the pass@1 on MBPP agrees with the reported number. Does anyone have any idea why this happened?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment