MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly Paper • 2505.10610 • Published 9 days ago • 52
φ-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation Paper • 2503.13288 • Published Mar 17 • 51
CoSER: Coordinating LLM-Based Persona Simulation of Established Roles Paper • 2502.09082 • Published Feb 13 • 29
BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language Models Paper • 2502.07346 • Published Feb 11 • 54
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis Paper • 2412.19723 • Published Dec 27, 2024 • 88
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents Paper • 2410.23218 • Published Oct 30, 2024 • 51
LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages Paper • 2407.05975 • Published Jul 8, 2024 • 38
Question Translation Training for Better Multilingual Reasoning Paper • 2401.07817 • Published Jan 15, 2024 • 1