| title: README | |
| emoji: 🔥 | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: static | |
| pinned: false | |
| # EvalPlus: Rigorous Evaluation of LLMs for Code Generation | |
| * 💻 **GitHub Repo**: [evalplus/evalplus](https://github.com/evalplus/evalplus) | |
| * 🏆 **Leader Board**: [evalplus.github.io](https://evalplus.github.io/leaderboard.html) | |
| * 📜 **NeurIPS Paper**: [OpenReview](https://openreview.net/pdf?id=1qvx610Cu7) | |
| * 🐍 **Python Package**: [PyPI](https://pypi.org/project/evalplus/) | |
| ```bibtex | |
| @inproceedings{evalplus, | |
| title = {Is Your Code Generated by Chat{GPT} Really Correct? Rigorous Evaluation of Large Language Models for Code Generation}, | |
| author = {Liu, Jiawei and Xia, Chunqiu Steven and Wang, Yuyao and Zhang, Lingming}, | |
| booktitle = {Thirty-seventh Conference on Neural Information Processing Systems}, | |
| year = {2023}, | |
| url = {https://openreview.net/forum?id=1qvx610Cu7}, | |
| } | |
| ``` | |