--- language: - multilingual license: other license_name: kwaipilot-license license_link: LICENSE library_name: transformers ---
Stage | Core Idea | Key Techniques | Outcome |
---|---|---|---|
1. Pre-training | Inject knowledge while separating “reasoning” from “direct answering”. |
Dual-regime data • Think-off queries labeled via a custom tagging system. • Think-on queries generated by a multi-agent solver. Knowledge Distillation + Multi-Token Prediction for fine-grained utility. |
Base model attains strong factual and reasoning skills without full-scale pre-training costs. |
2. Post-training | Make reasoning optional and efficient. |
Cold-start AutoThink — majority vote sets the initial thinking mode. Step-SRPO — intermediate supervision rewards correct mode selection and answer accuracy under that mode. |
Model triggers CoT only when beneficial, reducing token use and speeding inference. |