Evaluating Vision-Language Models as Evaluators in Path Planning Paper • 2411.18711 • Published Nov 27, 2024
VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search Paper • 2503.10582 • Published Mar 13 • 23
Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators Paper • 2503.19877 • Published Mar 25 • 1
VisualPuzzles: Decoupling Multimodal Reasoning Evaluation from Domain Knowledge Paper • 2504.10342 • Published Apr 14 • 11
Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time Paper • 2504.12329 • Published Apr 12
The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think Paper • 2505.10185 • Published May 15 • 25
VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation Paper • 2506.03930 • Published Jun 4 • 24
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning Paper • 2507.00432 • Published 6 days ago • 54
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning Paper • 2507.00432 • Published 6 days ago • 54
VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation Paper • 2506.03930 • Published Jun 4 • 24
Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem Paper • 2506.03295 • Published Jun 3 • 17
Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem Paper • 2506.03295 • Published Jun 3 • 17
StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs Paper • 2505.20139 • Published May 26 • 18
COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values Paper • 2504.05535 • Published Apr 7 • 44
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines Paper • 2502.14739 • Published Feb 20 • 105
ScholarCopilot: Training Large Language Models for Academic Writing with Accurate Citations Paper • 2504.00824 • Published Apr 1 • 43