MMTEB: Massive Multilingual Text Embedding Benchmark Paper โข 2502.13595 โข Published Feb 19 โข 38
From Tools to Teammates: Evaluating LLMs in Multi-Session Coding Interactions Paper โข 2502.13791 โข Published Feb 19 โข 5
Bridging the Data Provenance Gap Across Text, Speech and Video Paper โข 2412.17847 โข Published Dec 19, 2024 โข 9
Surveying the Effects of Quality, Diversity, and Complexity in Synthetic Data From Large Language Models Paper โข 2412.02980 โข Published Dec 4, 2024 โข 15
Consent in Crisis: The Rapid Decline of the AI Data Commons Paper โข 2407.14933 โข Published Jul 20, 2024 โข 12