-
spiral-rl/Spiral-Qwen3-4B
4B • Updated • 15 • 3 -
spiral-rl/Spiral-DeepSeek-R1-Distill-Qwen-7B
8B • Updated • 7 • 2 -
spiral-rl/Spiral-Kuhn-Poker-Qwen3-32B-SFT
Viewer • Updated • 25.5k -
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning
Paper • 2506.24119 • Published • 35
AI & ML interests
None defined yet.
Recent Activity
-
spiral-rl/Spiral-Qwen3-4B
4B • Updated • 15 • 3 -
spiral-rl/Spiral-DeepSeek-R1-Distill-Qwen-7B
8B • Updated • 7 • 2 -
spiral-rl/Spiral-Kuhn-Poker-Qwen3-32B-SFT
Viewer • Updated • 25.5k -
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning
Paper • 2506.24119 • Published • 35