about
A merge between a L3 70b model (Dolphin 2.9.1) and a L3.1 70b base model (Tess 3) inspired by https://huggingface.co/sophosympatheia/New-Dawn-Llama-3.1-70B-v1.1 and https://huggingface.co/jukofyork/Dusk-Miqu-70B .
- I called it straight "0.2" to leave room for an eventual "0.1" respecting Sophos' method to the letter if my edits didn't pan out.
- I (re?)added q.proj, k.proj (the rope base frequency is similar between L3 and L3.1), imput.layernorm and post.attention.layernorm (why not?) to Sophos' mix, so all tensors are merged the same way (I guess I should just have gone layer-wide then, but whatever lol).
- First test with with rescale false, next will be will rescale true.
benchs
- PPL 512 Wikitext Eng : 2.93 (excellent)
- ARC-C : 62.55 (very good)
- ARC-E : 82.80 (very good)
- Changes are minimal compared to Tess, to the point of noise (due to no rescale I guess)
tests
Let's see if it can hold long context (on testing):
- At 10k, it holds coherence.
- At 20k, it holds coherence.
- At 28k, it holds coherence.
- That's good enough to validate this step for now, because the merged material is a 8k context L3, not even a Miqu (32k context).
credits
Credits go to Jukofyork and Sophosympatheia, as well as to the Arcee/Mergekit folks and models authors of course.
merge
This is a merge of pre-trained language models created using mergekit.
Merge Details
Merge Method
This model was merged using the Linear DELLA merge method using migtissera/Tess-3-Llama-3.1-70B as a base.
Models Merged
The following models were included in the merge:
Configuration
The following YAML configuration was used to produce this model:
merge_method: della_linear
base_model: migtissera/Tess-3-Llama-3.1-70B
models:
- model: cognitivecomputations/dolphin-2.9.1-llama-3-70b
parameters:
weight:
- filter: q_proj
value: [0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0]
- filter: k_proj
value: [0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0]
- filter: v_proj
value: [0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0]
- filter: o_proj
value: [0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0]
- filter: input_layernorm
value: [0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0]
- filter: up_proj
value: [0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0]
- filter: gate_proj
value: [0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0]
- filter: down_proj
value: [0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0]
- filter: post_attention_layernorm
value: [0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0]
- value: 0
density: 0.25
epsilon: 0.05
lambda: 1.0
- model: migtissera/Tess-3-Llama-3.1-70B
parameters:
weight: 1.0
density:
- filter: q_proj
value: [1, 1, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 1, 1]
- filter: k_proj
value: [1, 1, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 1, 1]
- filter: v_proj
value: [1, 1, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 1, 1]
- filter: o_proj
value: [1, 1, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 1, 1]
- filter: input_layernorm
value: [1, 1, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 1, 1]
- filter: up_proj
value: [1, 1, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 1, 1]
- filter: gate_proj
value: [1, 1, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 1, 1]
- filter: down_proj
value: [1, 1, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 1, 1]
- filter: post_attention_layernorm
value: [1, 1, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 1, 1]
- value: 0.5
epsilon:
- filter: q_proj
value: [0, 0, 0.05, 0.05, 0.07, 0.1, 0.07, 0.05, 0.05, 0, 0]
- filter: k_proj
value: [0, 0, 0.05, 0.05, 0.07, 0.1, 0.07, 0.05, 0.05, 0, 0]
- filter: v_proj
value: [0, 0, 0.05, 0.05, 0.07, 0.1, 0.07, 0.05, 0.05, 0, 0]
- filter: o_proj
value: [0, 0, 0.05, 0.05, 0.07, 0.1, 0.07, 0.05, 0.05, 0, 0]
- filter: input_layernorm
value: [0, 0, 0.05, 0.05, 0.07, 0.1, 0.07, 0.05, 0.05, 0, 0]
- filter: up_proj
value: [0, 0, 0.05, 0.05, 0.07, 0.1, 0.07, 0.05, 0.05, 0, 0]
- filter: gate_proj
value: [0, 0, 0.05, 0.05, 0.07, 0.1, 0.07, 0.05, 0.05, 0, 0]
- filter: down_proj
value: [0, 0, 0.05, 0.05, 0.07, 0.1, 0.07, 0.05, 0.05, 0, 0]
- filter: post_attention_layernorm
value: [0, 0, 0.05, 0.05, 0.07, 0.1, 0.07, 0.05, 0.05, 0, 0]
- value: 0.1
lambda: 1.0
dtype: bfloat16
out_dtype: bfloat16
parameters:
int8_mask: true
normalize: true
rescale: false
chat_template: auto
tokenizer:
source: union
- Downloads last month
- 5
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for NexesMess/Llama_3.x_70b_Flipper_0.2
Merge model
this model