FVEL: Interactive Formal Verification Environment with Large Language Models via Theorem Proving Paper • 2406.14408 • Published Jun 20, 2024
OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization Modeling Paper • 2407.09887 • Published Jul 13, 2024
AlignedCoT: Prompting Large Language Models via Native-Speaking Demonstrations Paper • 2311.13538 • Published Nov 22, 2023
SeePhys: Does Seeing Help Thinking? -- Benchmarking Vision-Based Physics Reasoning Paper • 2505.19099 • Published May 25 • 8
TRIGO: Benchmarking Formal Mathematical Proof Reduction for Generative Language Models Paper • 2310.10180 • Published Oct 16, 2023 • 1
MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data Paper • 2402.08957 • Published Feb 14, 2024