EgoSpeak: Learning When to Speak for Egocentric Conversational Agents in the Wild Paper • 2502.14892 • Published 12 days ago • 4
Towards Fully-Automated Materials Discovery via Large-Scale Synthesis Dataset and Expert-Level LLM-as-a-Judge Paper • 2502.16457 • Published 6 days ago • 10
🧠 Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community • 12 items • Updated 9 days ago • 84
EXAONE-3.5 Collection EXAONE 3.5 language model series including instruction-tuned models of 2.4B, 7.8B, and 32B. • 10 items • Updated Dec 10, 2024 • 98
Granite 3.0 Language Models Collection A series of language models trained by IBM licensed under Apache 2.0 license. We release both the base pretrained and instruct models. • 8 items • Updated 4 days ago • 96
Magpie Conversation Ko Collection Magpie 데이터셋 한국어 번역본 (@nayohan님 번역 모델 사용) • 10 items • Updated Nov 6, 2024 • 1
MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures Paper • 2406.06565 • Published Jun 3, 2024 • 9
Magpie-Qwen2 Datasets Collection Dataset built with Qwen2 72B and Qwen2 7B. • 6 items • Updated Jan 13 • 10
Awesome feedback datasets Collection A curated list of datasets with human or AI feedback. Useful for training reward models or applying techniques like DPO. • 19 items • Updated Apr 12, 2024 • 68
Standard-format-preference-dataset Collection We collect the open-source datasets and process them into the standard format. • 14 items • Updated May 8, 2024 • 24
Eurus Collection Advancing LLM Reasoning Generalists with Preference Trees • 11 items • Updated Oct 22, 2024 • 24