Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability Paper • 2412.18551 • Published Dec 24, 2024
Language Models Can See Better: Visual Contrastive Decoding For LLM Multimodal Reasoning Paper • 2502.11751 • Published Feb 17
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models Paper • 2504.11468 • Published Apr 10 • 28
ViLBench: A Suite for Vision-Language Process Reward Modeling Paper • 2503.20271 • Published Mar 26 • 7
reWordBench: Benchmarking and Improving the Robustness of Reward Models with Transformed Inputs Paper • 2503.11751 • Published Mar 14 • 16
Focus on what matters: Applying Discourse Coherence Theory to Cross Document Coreference Paper • 2110.05362 • Published Oct 11, 2021
Mind the Gap! Static and Interactive Evaluations of Large Audio Models Paper • 2502.15919 • Published Feb 21 • 4
Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task Paper • 1809.08887 • Published Sep 24, 2018 • 2
ScisummNet: A Large Annotated Corpus and Content-Impact Models for Scientific Paper Summarization with Citation Networks Paper • 1909.01716 • Published Sep 4, 2019
Beyond Positive Scaling: How Negation Impacts Scaling Trends of Language Models Paper • 2305.17311 • Published May 27, 2023 • 1
WILDS: A Benchmark of in-the-Wild Distribution Shifts Paper • 2012.07421 • Published Dec 14, 2020 • 1
LM-Critic: Language Models for Unsupervised Grammatical Error Correction Paper • 2109.06822 • Published Sep 14, 2021
Extending the WILDS Benchmark for Unsupervised Adaptation Paper • 2112.05090 • Published Dec 9, 2021 • 1
UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models Paper • 2201.05966 • Published Jan 16, 2022 • 1