Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation Paper • 2412.03304 • Published 22 days ago • 17
Compression, Transduction, and Creation: A Unified Framework for Evaluating Natural Language Generation Paper • 2109.06379 • Published Sep 14, 2021
Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs Paper • 2406.20098 • Published Jun 28
O1 Replication Journey: A Strategic Progress Report -- Part 1 Paper • 2410.18982 • Published Oct 8 • 1
Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models Paper • 2308.16149 • Published Aug 30, 2023 • 25
SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation Paper • 2410.03960 • Published Oct 4 • 1
Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale Paper • 2409.17115 • Published Sep 25 • 60
Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale Paper • 2409.17115 • Published Sep 25 • 60
SciCode: A Research Coding Benchmark Curated by Scientists Paper • 2407.13168 • Published Jul 18 • 13