| | | |:---:|:---:| | **Architecture** | Mixture-of-Experts (MoE) | | **Total Parameters** | 1T | | **Activated Parameters** | 32B | | **Number of Layers** (Dense layer included) | 61 | | **Number of Dense Layers** | 1 | | **Attention Hidden Dimension** | 7168 | | **MoE Hidden Dimension** (per Expert) | 2048 | | **Number of Attention Heads** | 64 | | **Number of Experts** | 384 | | **Selected Experts per Token** | 8 | | **Number of Shared Experts** | 1 | | **Vocabulary Size** | 160K | | **Context Length** | 128K | | **Attention Mechanism** | MLA | | **Activation Function** | SwiGLU |

Benchmark	Metric	Kimi K2 Instruct	DeepSeek-V3-0324	Qwen3-235B-A22B ^{(non-thinking)}	Claude Sonnet 4 ^{(w/o extended thinking)}	Claude Opus 4 ^{(w/o extended thinking)}	GPT-4.1	Gemini 2.5 Flash Preview (05-20)
Coding Tasks
LiveCodeBench v6 ^{(Aug 24 - May 25)}	Pass@1	53.7	46.9	37.0	48.5	47.444.7	44.7
OJBench	Pass@1	27.1	24.0	11.3	15.3	19.6	19.5	19.5
MultiPL-E	Pass@1	86.7	83.1	78.2	88.6	89.6	86.7	85.6
SWE-bench Verified ^{(Agentless Coding)}	Single Patch	51.8	36.6	39.4	50.2	53.0	40.8	32.6
SWE-bench Verified ^{(Agentic Coding)}	Single Attempt (Acc)	65.8	38.8	34.4	72.7	72.5^*	54.6	—
SWE-bench Verified ^{(Agentic Coding)}	Multiple Attempts (Acc)	71.6	—	—	80.2	79.4^*	—	—
SWE-bench Multilingual ^{(Agentic Coding)}	Single Attempt (Acc)	47.3	25.8	20.9	51.0	—	31.5	—
SWE-bench Multilingual ^{(Agentic Coding)}	Inhouse Framework (Acc)	30.0	—	—	35.5	43.2	8.30	—
TerminalBench	Acc	25.0	16.3	6.60	—	—	30.3	16.8
Aider-Polyglot	Acc	60.0	55.1	61.8	56.4	70.7	52.4	44.0
Tool Use Tasks
Tau2 retail	Avg@4	70.6	69.1	57.0	75.0	81.8	74.8	64.3
Tau2 airline	Avg@4	56.5	39.0	26.5	55.5	60.0	54.5	42.5
Tau2 telecom	Avg@4	65.8	32.5	22.1	45.2	57.0	38.6	16.9
AceBench	Acc	76.5	72.7	70.5	76.2	75.6	80.1	74.5
Math & STEM Tasks
AIME 2024	Avg@64	69.6	59.4^*	40.1^*	43.4	48.2	46.5	61.3
AIME 2025	Avg@64	49.5	46.7	24.7^*	33.1^*	33.9^*	37.0	46.6
MATH-500	Acc	97.4	94.0^*	91.2^*	94.0	94.4	92.4/td>	95.4
HMMT 2025	Avg@32	38.8	27.5	11.9	15.9	15.8	19.4	34.7
CNMO 2024	Avg@16	74.3	74.7	48.6	60.4	57.6	56.6	75.0
PolyMath-en	Avg@4	65.1	59.5	51.9	52.8	49.8	54.0	49.9
ZebraLogic	Acc	89.0	84.0	37.7	73.7	59.3	58.5	57.9
AutoLogi	Acc	89.5	88.9	83.3	89.8	86.1	88.2	84.1
GPQA-Diamond	Avg@8	75.1	68.4^*	62.9^*	70.0^*	74.9^*	66.3	68.2
SuperGPQA	Acc	57.2	53.7	50.2	55.7	56.5	50.8	49.6
Humanity’s Last	(Text Only)	4.7	5.2	5.7	5.8	7.1	3.7	5.6
General Tasks
MMLU	EM	89.5	89.4	87.0	91.5	92.9	90.4	90.1
MMLU-Redux	EM	92.7	90.5	89.2	93.6	94.2	92.4	90.6
MMLU-Pro	EM	81.1	81.2^*	77.3	83.7	86.6	81.8	79.4
IFEval	Prompt Strict	89.8	81.1	83.2^*	87.6	87.4	88.0	84.3
Multi-Challenge	Acc	54.1	31.4	34.0	46.8	49.0	36.4	39.5
SimpleQA	Correct	31.0	27.7	13.2	15.9	22.8	42.3	23.3
Livebench	Pass@1	76.4	72.4	67.6	74.8	74.6	69.8	67.8

| Benchmark | Metric | Shot | Kimi K2 Base | Deepseek-V3-Base | Qwen2.5-72B | Llama 4 Maverick | |:-------------------:|:----------:|:---------:|:--------------:|:------------------:|:-------------:|:------------------:| | **General Tasks** | | | | | | | | MMLU | EM | 5-shot | **87.79** | 87.1 | 86.08 | 84.87 | | MMLU-pro | EM | 5-shot | **69.17** | 60.59 | 62.8 | 63.47 | | MMLU-redux-2.0 | EM | 5-shot | **90.17** | 89.53 | 87.77 | 88.18 | | SimpleQA | Correct | 5-shot | **35.25** | 26.49 | 10.31 | 23.74 | | TriviaQA | EM | 5-shot | **85.09** | 84.11 | 76.03 | 79.25 | | GPQA-Diamond | Avg@8 | 5-shot | 48.11 | **50.51** | 40.78 | 49.43 | | SuperGPQA | EM | 5-shot | **44.67** | 39.2 | 34.23 | 38.84 | | **Code Tasks** | | | | | | | | LiveCodeBench v6 | Pass@1 | 1-shot | **26.29** | 22.86 | 21.14 | 25.14 | | EvalPlus | Pass@1 | - | **80.33** | 65.61 | 66.04 | 65.48 | | **Mathematics Tasks** | | | | | | | | MATH | EM | 4-shot | **70.22** | 60.06 | 60.96 | 63.02 | | GSM8k | EM | 8-shot | **92.12** | 91.66 | 90.37 | 86.35 | | **Chinese Tasks** | | | | | | | | C-Eval | EM | 5-shot | **92.5** | 90.04 | 90.86 | 80.91 | | CSimpleQA | Correct | 5-shot | **77.57** | 72.13 | 50.53 | 53.47 |