ThinkPRM Collection Process Reward Models that Think -- https://arxiv.org/abs/2504.16828 • 6 items • Updated 7 days ago
FactBench: A Dynamic Benchmark for In-the-Wild Language Model Factuality Evaluation Paper • 2410.22257 • Published Oct 29, 2024
CLASH: Evaluating Language Models on Judging High-Stakes Dilemmas from Multiple Perspectives Paper • 2504.10823 • Published Apr 15 • 14
BiRdQA: A Bilingual Dataset for Question Answering on Tricky Riddles Paper • 2109.11087 • Published Sep 23, 2021
A PhD Student's Perspective on Research in NLP in the Era of Very Large Language Models Paper • 2305.12544 • Published May 21, 2023 • 1
Discriminator-Guided Multi-step Reasoning with Language Models Paper • 2305.14934 • Published May 24, 2023 • 1
Source-Aware Training Enables Knowledge Attribution in Language Models Paper • 2404.01019 • Published Apr 1, 2024 • 1
Small Language Models Need Strong Verifiers to Self-Correct Reasoning Paper • 2404.17140 • Published Apr 26, 2024 • 1