Evaluation for Generative AI - a vincentkoc Collection

vincentkoc 's Collections

LLM Agent and Prompt Optimizers

Evaluation for Generative AI

Evaluation for Generative AI

updated 5 days ago

Papers and resources that are dealing with the evaluation of large language models and generative AI.

Humanity's Last Exam

Paper • 2501.14249 • Published Jan 24 • 75
RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques

Paper • 2501.14492 • Published Jan 24 • 34
vincentkoc/tiny_qa_benchmark

Viewer • Updated 5 days ago • 52 • 89 • 1
vincentkoc/tiny_qa_benchmark_pp

Viewer • Updated 5 days ago • 662 • 254 • 1
Tiny QA Benchmark++: Ultra-Lightweight, Synthetic Multilingual Dataset Generation & Smoke-Tests for Continuous LLM Evaluation

Paper • 2505.12058 • Published 7 days ago • 6
tinyBenchmarks: evaluating LLMs with fewer examples

Paper • 2402.14992 • Published Feb 22, 2024 • 14