Digital Socrates: Evaluating LLMs through explanation critiques Paper • 2311.09613 • Published Nov 16, 2023 • 1
PromptBench: A Unified Library for Evaluation of Large Language Models Paper • 2312.07910 • Published Dec 13, 2023 • 15
Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models Paper • 2412.09645 • Published 16 days ago • 35