Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models Paper • 2506.18945 • Published 3 days ago • 25
Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning Paper • 2506.13654 • Published 10 days ago • 42
ToMAP: Training Opponent-Aware LLM Persuaders with Theory of Mind Paper • 2505.22961 • Published 29 days ago • 8
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce Paper • 2504.11343 • Published Apr 15 • 18
Rethinking Diverse Human Preference Learning through Principal Component Analysis Paper • 2502.13131 • Published Feb 18 • 38
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce Paper • 2504.11343 • Published Apr 15 • 18
Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL Paper • 2505.02391 • Published May 5 • 24
Rethinking Diverse Human Preference Learning through Principal Component Analysis Paper • 2502.13131 • Published Feb 18 • 38
MaxwellJryao/sft_loraMoE_wiki_hop_original_choose_best_object_affirmative_1-lora-sft_Qwen2-1.5B_lr-1e-3 Updated Sep 5, 2024
Post-training-Data-Flywheel/NousResearch-hermes-function-calling-v1 Viewer • Updated Aug 30, 2024 • 1.89k • 33
Post-training-Data-Flywheel/glaiveai-glaive-function-calling-v2 Viewer • Updated Aug 23, 2024 • 75.2k • 31 • 1
Post-training-Data-Flywheel/ise-uiuc-Magicoder-OSS-Instruct-75K Viewer • Updated Aug 23, 2024 • 75.2k • 26
Post-training-Data-Flywheel/Salesforce-xlam-function-calling-60k Viewer • Updated Aug 23, 2024 • 60k • 37
Post-training-Data-Flywheel/RLHFlow-CodeUltraFeedback-standard Viewer • Updated Aug 23, 2024 • 38.4k • 37 • 1