CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation Paper • 2504.00043 • Published Mar 30 • 10
S$^{2}$FT: Efficient, Scalable and Generalizable LLM Fine-tuning by Structured Sparsity Paper • 2412.06289 • Published Dec 9, 2024
Taming Overconfidence in LLMs: Reward Calibration in RLHF Paper • 2410.09724 • Published Oct 13, 2024 • 3