The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think Paper • 2505.10185 • Published about 23 hours ago • 14
Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators Paper • 2503.19877 • Published Mar 25
Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning Paper • 2502.17407 • Published Feb 24 • 26
Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning Paper • 2502.17407 • Published Feb 24 • 26
TWICE: What Advantages Can Low-Resource Domain-Specific Embedding Model Bring? - A Case Study on Korea Financial Texts Paper • 2502.07131 • Published Feb 10
HAE-RAE Bench: Evaluation of Korean Knowledge in Language Models Paper • 2309.02706 • Published Sep 6, 2023 • 2
KMMLU: Measuring Massive Multitask Language Understanding in Korean Paper • 2402.11548 • Published Feb 18, 2024
Removing Non-Stationary Knowledge From Pre-Trained Language Models for Entity-Level Sentiment Classification in Finance Paper • 2301.03136 • Published Jan 9, 2023
EaSyGuide : ESG Issue Identification Framework leveraging Abilities of Generative Large Language Models Paper • 2306.06662 • Published Jun 11, 2023
LLM-as-a-Judge & Reward Model: What They Can and Cannot Do Paper • 2409.11239 • Published Sep 17, 2024 • 2
Understand, Solve and Translate: Bridging the Multilingual Mathematical Reasoning Gap Paper • 2501.02448 • Published Jan 5
LLM-as-an-Interviewer: Beyond Static Testing Through Dynamic LLM Evaluation Paper • 2412.10424 • Published Dec 10, 2024 • 2
Bridging the Data Provenance Gap Across Text, Speech and Video Paper • 2412.17847 • Published Dec 19, 2024 • 9