Kimi K2: 1T-Param MoE Model for Agentic AI
#26
by
reach-vb
- opened
Kimi K2: Open Agentic Intelligence
Overview
Kimi K2 is a state-of-the-art mixture-of-experts (MoE) language model with 32 billion activated parameters and 1 trillion total parameters. It excels in frontier knowledge, reasoning, and coding tasks, optimized for agentic capabilities.
Key Features
- Large-Scale Training: Trained on 15.5T tokens with zero instability.
- MuonClip Optimizer: Novel optimization for stable scaling.
- Agentic Intelligence: Designed for tool use, reasoning, and autonomous problem-solving.
Model Variants
- Kimi-K2-Base: Foundation model for fine-tuning.
- Kimi-K2-Instruct: Post-trained for general-purpose chat and agentic tasks.
Technical Specifications
- Architecture: Mixture-of-Experts (MoE)
- Total Parameters: 1T
- Activated Parameters: 32B
- Layers: 61 (including 1 dense layer)
- Attention Hidden Dimension: 7168
- MoE Hidden Dimension: 2048 (per expert)
- Attention Heads: 64
- Experts: 384
- Selected Experts per Token: 8
- Vocabulary Size: 160K
- Context Length: 128K
- Attention Mechanism: MLA
- Activation Function: SwiGLU
Evaluation Highlights
Kimi K2 outperforms competitors in various benchmarks:
- Coding: LiveCodeBench (53.7% Pass@1), SWE-bench Verified (71.6% Multiple Attempts Acc)
- Tool Use: Tau2 retail (70.6% Avg@4), AceBench (76.5% Acc)
- Math & STEM: AIME 2024 (69.6% Avg@64), MATH-500 (97.4% Acc)
- General Tasks: MMLU (89.5% EM), Livebench (76.4% Pass@1)
Deployment
- API: Available on Moonshot AI platform (OpenAI/Anthropic-compatible).
- Inference Engines: vLLM, SGLang, KTransformers, TensorRT-LLM.
- Model Format: Block-fp8 on Hugging Face.
Usage Examples
Chat Completion
def simple_chat(client, model_name):
messages = [
{"role": "system", "content": "You are Kimi..."},
{"role": "user", "content": [{"type": "text", "text": "Self-intro"}]}
]
response = client.chat.completions.create(
model=model_name,
messages=messages,
temperature=0.6,
max_tokens=256
)
print(response.choices[0].message.content)
Tool Calling
def tool_call_with_client(client, model_name):
# Define tools and tool_map
messages = [...]
while finish_reason in [None, "tool_calls"]:
completion = client.chat.completions.create(...)
# Process tool calls and append results to messages
License
Modified MIT License.
Contact
bigeagle
changed discussion status to
closed