metadata

license: apache-2.0
language:
  - en
  - zh
base_model:
  - deepseek-ai/DeepSeek-R1-0528-Qwen3-8B
  - Qwen/Qwen3-8B
  - Qwen/Qwen3-8B-Base
pipeline_tag: text-generation
tags:
  - merge

Model Highlights:

Optimal merge method: nuslerp
Highest precision: dtype: float32 + out_dtype: bfloat16
Brand-new chat template: ensures normal operation on LM Studio
Context length: 32768 128K context version

Parameter Settings:

Thinking Mode: (Recommend)

Temperature=0.6, TopP=0.95, TopK=20,MinP=0.

Non-thinking Mode: (Not recommend)

\no_think may not work sometimes

Temperature=0.7, TopP=0.8, TopK=20,MinP=0.

Configuration:

The following YAML configuration was used to produce this model:

models:
  - model: deepseek-ai/DeepSeek-R1-0528-Qwen3-8B
    parameters:
      weight: 1
  - model: Qwen/Qwen3-8B
    parameters:
      weight: 1
merge_method: nuslerp
base_model: Qwen/Qwen3-8B-Base
tokenizer_source: Qwen/Qwen3-8B
parameters:
  normalize: true
  int8_mask: true
dtype: float32
out_dtype: bfloat16