Extended the ICM paper to show cross-model capability transfer - used Qwen3's mathematical reasoning to improve Gemma3 without any human supervision.
Key results:
Qwen3-0.6B: 63.2 → 66.0 on MATH-500 (+4%) Gemma3-1B: 41.0 → 45.6 on MATH-500 (+11%)
The method extracts coherent reasoning patterns from one model via Internal Coherence Maximization, converts them to DPO training data, and uses that to improve a completely different model architecture. This goes beyond the original ICM paper which only improved models using their own labels. We're showing you can transfer capabilities between any models - imagine extracting capabilities from strong models to improve your local ones.
Planning to extend this to code generation next. The approach could enable community-driven capability sharing between different model families without expensive annotation.