Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale Paper • 2409.17115 • Published Sep 25, 2024 • 63
Data Contamination Report from the 2024 CONDA Shared Task Paper • 2407.21530 • Published Jul 31, 2024 • 10
OlympicArena Medal Ranks: Who Is the Most Intelligent AI So Far? Paper • 2406.16772 • Published Jun 24, 2024 • 2
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI Paper • 2406.12753 • Published Jun 18, 2024 • 14
Benchmarking Benchmark Leakage in Large Language Models Paper • 2404.18824 • Published Apr 29, 2024 • 6
Generative AI for Math: Part I -- MathPile: A Billion-Token-Scale Pretraining Corpus for Math Paper • 2312.17120 • Published Dec 28, 2023 • 27
Ask Again, Then Fail: Large Language Models' Vacillations in Judgement Paper • 2310.02174 • Published Oct 3, 2023 • 3
Is ChatGPT a Good Sentiment Analyzer? A Preliminary Study Paper • 2304.04339 • Published Apr 10, 2023 • 1