metadata
license: apache-2.0
language:
- en
- zh
base_model:
- deepseek-ai/DeepSeek-R1-0528-Qwen3-8B
- Qwen/Qwen3-8B
- Qwen/Qwen3-8B-Base
pipeline_tag: text-generation
tags:
- merge
Model Highlights:
Optimal merge method:
nuslerp
Highest precision:
dtype: float32
+out_dtype: bfloat16
Brand-new chat template: ensures normal operation on LM Studio
Context length:
32768
128K context version
Parameter Settings:
Thinking Mode: (Recommend)
Temperature=0.6
,TopP=0.95
,TopK=20
,MinP=0
.
Non-thinking Mode: (Not recommend)
\no_think
may not work sometimes
Temperature=0.7
,TopP=0.8
,TopK=20
,MinP=0
.
Configuration:
The following YAML configuration was used to produce this model:
models:
- model: deepseek-ai/DeepSeek-R1-0528-Qwen3-8B
parameters:
weight: 1
- model: Qwen/Qwen3-8B
parameters:
weight: 1
merge_method: nuslerp
base_model: Qwen/Qwen3-8B-Base
tokenizer_source: Qwen/Qwen3-8B
parameters:
normalize: true
int8_mask: true
dtype: float32
out_dtype: bfloat16