merge

This is a merge of pre-trained language models created using mergekit.

Merge Details

Merge Method

This model was merged using the SLERP merge method.

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:


# --- Merge Configuration: Linear Ramp with Mild Head/Input Blend ---
models:
  - model: Vdr1/L3-8B-Sunfall-v0.4-Lunar-Stheno
  - model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
merge_method: slerp
base_model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
dtype: bfloat16

# t-schedule per layer: small Sunfall on input/head, ramping up linearly,
# plateau at 0.8 through the mid-network, then ramp down slightly at the end.
parameters:
  t: [0.2, 0.4, 0.6, 0.8, 0.8, 0.6, 0.4, 0.2, 0.2]
# — Interpretation by layer:
#   [Embedding,   Block1, Block2, Block3, Block4, Block5, Block6, Block7,  LM Head]
#    0.2 Sunfall   ↑ ramp ↑            plateau          ↓ ramp      0.2 Sunfall

# Notes:
# 1. Using a slightly nonzero t=0.2 at both the embedding layer and LM head
#    helps align scales and prevents representational drift when Sunfall’s
#    mid-network activations become stronger.
# 2. The linear ramp (0.2 → 0.8) avoids any sudden “hard switch” that can cause
#    instability or hallucination bubbles in a single block.
# 3. The plateau of t=0.8 across four central blocks emphasizes Sunfall’s style
#    without giving it 100% in any one layer—this typically yields more coherent
#    blending than a full t=1.0 peak.
# 4. You can experiment with extending or contracting the plateau (e.g. 3 vs. 5
#    central blocks) based on your qualitative outputs.

# “Hermes for input & output” remains in effect by using t=0.2 at both ends.
Downloads last month
15
Safetensors
Model size
8.03B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mergekit-community/mergekit-slerp-ztfphjf