PHYSICS: Benchmarking Foundation Models on University-Level Physics Problem Solving Paper • 2503.21821 • Published Mar 26 • 17
IFIR: A Comprehensive Benchmark for Evaluating Instruction-Following in Expert-Domain Information Retrieval Paper • 2503.04644 • Published Mar 6 • 21
MRAG: A Modular Retrieval Framework for Time-Sensitive Question Answering Paper • 2412.15540 • Published Dec 20, 2024
SynTQA: Synergistic Table-based Question Answering via Mixture of Text-to-SQL and E2E TQA Paper • 2409.16682 • Published Sep 25, 2024
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding Paper • 2501.12380 • Published Jan 21 • 86