metadata
license: apache-2.0
language:
- en
- zh
base_model:
- deepseek-ai/DeepSeek-R1-0528-Qwen3-8B
- AXCXEPT/Qwen3-EZO-8B-beta
pipeline_tag: text-generation
tags:
- merge
Model Highlights:
merge method:
slerp
Highest precision:
dtype: float32
+out_dtype: bfloat16
Brand-new chat template: ensures normal operation on LM Studio
Context length:
131072
Model Selection Table:
Model | Context | Uses Basic Model |
---|---|---|
Qwen3-EZO-8B-YOYO-slerp | 32K | Yes |
Qwen3-EZO-8B-YOYO-slerp-128K | 128K | Yes |
Qwen3-EZO-8B-YOYO-nuslerp | 32K | No |
Qwen3-EZO-8B-YOYO-nuslerp-128K | 128K | No |
Qwen3-EZO-8B-YOYO-nuslerp-plus | 32K | Yes |
Qwen3-EZO-8B-YOYO-nuslerp-plus-128K | 128K | Yes |
Warning: Models with
128K
context may have slight quality loss. In most cases, please use the32K
native context!
Parameter Settings:
Thinking Mode:
Temperature=0.6
,TopP=0.95
,TopK=20
,MinP=0
.
Configuration:
The following YAML configuration was used to produce this model:
slices:
- sources:
- model: deepseek-ai/DeepSeek-R1-0528-Qwen3-8B
layer_range: [0, 36]
- model: AXCXEPT/Qwen3-EZO-8B-beta
layer_range: [0, 36]
merge_method: slerp
base_model: AXCXEPT/Qwen3-EZO-8B-beta
parameters:
t:
- filter: self_attn
value: [0, 0.5, 0.3, 0.7, 1]
- filter: mlp
value: [1, 0.5, 0.7, 0.3, 0]
- value: 0.5
tokenizer_source: base
dtype: float32
out_dtype: bfloat16