YOYO-AI's picture
Update README.md
16df785 verified
metadata
license: apache-2.0
language:
  - en
  - zh
base_model:
  - deepseek-ai/DeepSeek-R1-0528-Qwen3-8B
  - AXCXEPT/Qwen3-EZO-8B-beta
pipeline_tag: text-generation
tags:
  - merge

Model Highlights:

  • merge method: slerp

  • Highest precision: dtype: float32 + out_dtype: bfloat16

  • Brand-new chat template: ensures normal operation on LM Studio

  • Context length: 131072

Model Selection Table:

Warning: Models with 128K context may have slight quality loss. In most cases, please use the 32K native context!

Parameter Settings:

Thinking Mode:

Temperature=0.6, TopP=0.95, TopK=20,MinP=0.

Configuration:

The following YAML configuration was used to produce this model:

slices:
  - sources:
      - model: deepseek-ai/DeepSeek-R1-0528-Qwen3-8B
        layer_range: [0, 36]
      - model: AXCXEPT/Qwen3-EZO-8B-beta
        layer_range: [0, 36]
merge_method: slerp
base_model: AXCXEPT/Qwen3-EZO-8B-beta
parameters:
  t:
    - filter: self_attn
      value: [0, 0.5, 0.3, 0.7, 1]
    - filter: mlp
      value: [1, 0.5, 0.7, 0.3, 0]
    - value: 0.5
tokenizer_source: base
dtype: float32
out_dtype: bfloat16