Right Answer, Wrong Score: Uncovering the Inconsistencies of LLM Evaluation in Multiple-Choice Question Answering Paper • 2503.14996 • Published Mar 19
Shedding More Light on Robust Classifiers under the lens of Energy-based Models Paper • 2407.06315 • Published Jul 8, 2024
Right Answer, Wrong Score: Uncovering the Inconsistencies of LLM Evaluation in Multiple-Choice Question Answering Paper • 2503.14996 • Published Mar 19
Optimizing LLMs for Italian: Reducing Token Fertility and Enhancing Efficiency Through Vocabulary Adaptation Paper • 2504.17025 • Published Apr 23 • 17
Optimizing LLMs for Italian: Reducing Token Fertility and Enhancing Efficiency Through Vocabulary Adaptation Paper • 2504.17025 • Published Apr 23 • 17
ZEBRA: Zero-Shot Example-Based Retrieval Augmentation for Commonsense Question Answering Paper • 2410.05077 • Published Oct 7, 2024 • 2
Exploring Non-Verbal Predicates in Semantic Role Labeling: Challenges and Opportunities Paper • 2307.01870 • Published Jul 4, 2023
PREGO: online mistake detection in PRocedural EGOcentric videos Paper • 2404.01933 • Published Apr 2, 2024 • 1
TI-PREGO: Chain of Thought and In-Context Learning for Online Mistake Detection in PRocedural EGOcentric Videos Paper • 2411.02570 • Published Nov 4, 2024
Escaping Plato's Cave: Towards the Alignment of 3D and Text Latent Spaces Paper • 2503.05283 • Published Mar 7 • 3
ZEBRA: Zero-Shot Example-Based Retrieval Augmentation for Commonsense Question Answering Paper • 2410.05077 • Published Oct 7, 2024 • 2
Echoes from Alexandria: A Large Resource for Multilingual Book Summarization Paper • 2306.04334 • Published Jun 7, 2023 • 2
Semantic Role Labeling Meets Definition Modeling: Using Natural Language to Describe Predicate-Argument Structures Paper • 2212.01094 • Published Dec 2, 2022 • 2
Do Large Language Models Have an English Accent? Evaluating and Improving the Naturalness of Multilingual LLMs Paper • 2410.15956 • Published Oct 21, 2024
Exploring Non-Verbal Predicates in Semantic Role Labeling: Challenges and Opportunities Paper • 2307.01870 • Published Jul 4, 2023