R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts Paper • 2502.20395 • Published 10 days ago • 43
Is your benchmark truly adversarial? AdvScore: Evaluating Human-Grounded Adversarialness Paper • 2406.16342 • Published Jun 24, 2024
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper • 2502.05171 • Published about 1 month ago • 122
OmnixR: Evaluating Omni-modality Language Models on Reasoning across Modalities Paper • 2410.12219 • Published Oct 16, 2024 • 1
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion Paper • 2412.04424 • Published Dec 5, 2024 • 61
DynaSaur: Large Language Agents Beyond Predefined Actions Paper • 2411.01747 • Published Nov 4, 2024 • 30
DynaSaur: Large Language Agents Beyond Predefined Actions Paper • 2411.01747 • Published Nov 4, 2024 • 30
Easy2Hard-Bench: Standardized Difficulty Labels for Profiling LLM Performance and Generalization Paper • 2409.18433 • Published Sep 27, 2024
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective Paper • 2410.23743 • Published Oct 31, 2024 • 62
MuLan: Multimodal-LLM Agent for Progressive and Interactive Multi-Object Diffusion Paper • 2402.12741 • Published Feb 20, 2024
CFMatch: Aligning Automated Answer Equivalence Evaluation with Expert Judgments For Open-Domain Question Answering Paper • 2401.13170 • Published Jan 24, 2024 • 4
PANDA (Pedantic ANswer-correctness Determination and Adjudication):Improving Automatic Evaluation for Question Answering and Text Generation Paper • 2402.11161 • Published Feb 17, 2024 • 1
Mosaic IT: Enhancing Instruction Tuning with Data Mosaics Paper • 2405.13326 • Published May 22, 2024 • 1
Diffusion Curriculum: Synthetic-to-Real Generative Curriculum Learning via Image-Guided Diffusion Paper • 2410.13674 • Published Oct 17, 2024 • 17
BenTo: Benchmark Task Reduction with In-Context Transferability Paper • 2410.13804 • Published Oct 17, 2024 • 20
OmnixR: Evaluating Omni-modality Language Models on Reasoning across Modalities Paper • 2410.12219 • Published Oct 16, 2024 • 1