Kimi K2: 1T-Param MoE Model for Agentic AI

#26
by reach-vb - opened

Kimi K2: Open Agentic Intelligence

Overview

Kimi K2 is a state-of-the-art mixture-of-experts (MoE) language model with 32 billion activated parameters and 1 trillion total parameters. It excels in frontier knowledge, reasoning, and coding tasks, optimized for agentic capabilities.

Key Features

  • Large-Scale Training: Trained on 15.5T tokens with zero instability.
  • MuonClip Optimizer: Novel optimization for stable scaling.
  • Agentic Intelligence: Designed for tool use, reasoning, and autonomous problem-solving.

Model Variants

  • Kimi-K2-Base: Foundation model for fine-tuning.
  • Kimi-K2-Instruct: Post-trained for general-purpose chat and agentic tasks.

Technical Specifications

  • Architecture: Mixture-of-Experts (MoE)
  • Total Parameters: 1T
  • Activated Parameters: 32B
  • Layers: 61 (including 1 dense layer)
  • Attention Hidden Dimension: 7168
  • MoE Hidden Dimension: 2048 (per expert)
  • Attention Heads: 64
  • Experts: 384
  • Selected Experts per Token: 8
  • Vocabulary Size: 160K
  • Context Length: 128K
  • Attention Mechanism: MLA
  • Activation Function: SwiGLU

Evaluation Highlights

Kimi K2 outperforms competitors in various benchmarks:

  • Coding: LiveCodeBench (53.7% Pass@1), SWE-bench Verified (71.6% Multiple Attempts Acc)
  • Tool Use: Tau2 retail (70.6% Avg@4), AceBench (76.5% Acc)
  • Math & STEM: AIME 2024 (69.6% Avg@64), MATH-500 (97.4% Acc)
  • General Tasks: MMLU (89.5% EM), Livebench (76.4% Pass@1)

Deployment

  • API: Available on Moonshot AI platform (OpenAI/Anthropic-compatible).
  • Inference Engines: vLLM, SGLang, KTransformers, TensorRT-LLM.
  • Model Format: Block-fp8 on Hugging Face.

Usage Examples

Chat Completion

def simple_chat(client, model_name):
    messages = [
        {"role": "system", "content": "You are Kimi..."},
        {"role": "user", "content": [{"type": "text", "text": "Self-intro"}]}
    ]
    response = client.chat.completions.create(
        model=model_name,
        messages=messages,
        temperature=0.6,
        max_tokens=256
    )
    print(response.choices[0].message.content)

Tool Calling

def tool_call_with_client(client, model_name):
    # Define tools and tool_map
    messages = [...]
    while finish_reason in [None, "tool_calls"]:
        completion = client.chat.completions.create(...)
        # Process tool calls and append results to messages

License

Modified MIT License.

Contact

[email protected]

bigeagle changed discussion status to closed

Sign up or log in to comment