Nexesenex commited on
Commit
4ffed11
·
verified ·
1 Parent(s): 0b41d6a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -14,7 +14,7 @@ A merge between a L3 70b model (Dolphin 2.9.1) and a L3.1 70b base model (Tess 3
14
 
15
  - I made 10 different versions. I retained 3 RC (Formely in my NexesMess repo, this one was Flipper 0.38. Flipper 0.36 will be Tess Dolphin 1.1. Flipper 0.32 will be Tess Dolphin 1.0)
16
  - I (re?)added q.proj, k.proj (the rope base frequency is similar between L3 and L3.1), imput.layernorm and post.attention.layernorm (why not?) to Sophos' mix, so all tensors are merged the same way (I guess I should just have gone layer-wide then, but whatever lol).
17
- - This v1.2 respects a quasi-triangular shape for its merge gradient, from layer 1 to level 79. It has a relatively high perplexity. (3.95) The most "Dolphin inprinted", for kicks & giggles.
18
  - the v1.1 leaves untouched the 4 first and 4 last layers of Tess. It has an intermediary perplexity. (3.75) A solid and balanced standalone version.
19
  - the V1.0 leaves untouched the 12 first and 12 last layers of Tess. It has the lowest perplexity. (3.55). Probably the most "mergeable" version with L3.1 and 3.3 models.
20
  - -> I will actually need to make a new version with 16/16 untouched layers (it will be v1.3), because that's the recommended recipe.
 
14
 
15
  - I made 10 different versions. I retained 3 RC (Formely in my NexesMess repo, this one was Flipper 0.38. Flipper 0.36 will be Tess Dolphin 1.1. Flipper 0.32 will be Tess Dolphin 1.0)
16
  - I (re?)added q.proj, k.proj (the rope base frequency is similar between L3 and L3.1), imput.layernorm and post.attention.layernorm (why not?) to Sophos' mix, so all tensors are merged the same way (I guess I should just have gone layer-wide then, but whatever lol).
17
+ - This v1.2 respects a quasi-triangular shape for its merge gradient, from layer 1 to level 78 (or 0 to 79, I'm not even sure). It has a relatively high perplexity. (3.95) The most "Dolphin inprinted", for kicks & giggles.
18
  - the v1.1 leaves untouched the 4 first and 4 last layers of Tess. It has an intermediary perplexity. (3.75) A solid and balanced standalone version.
19
  - the V1.0 leaves untouched the 12 first and 12 last layers of Tess. It has the lowest perplexity. (3.55). Probably the most "mergeable" version with L3.1 and 3.3 models.
20
  - -> I will actually need to make a new version with 16/16 untouched layers (it will be v1.3), because that's the recommended recipe.