Update README.md
Browse files
README.md
CHANGED
@@ -39,7 +39,7 @@ This is a merge of pre-trained language models created using [mergekit](https://
|
|
39 |
From there, each of the four threads was separately task-tuned on 2 datasets each.
|
40 |
Various methods of combining those via merge were tested, with this one scoring highest on EQ-Bench as an indicator.
|
41 |
|
42 |
-
My understanding of the Model Stock merge method is that it
|
43 |
I have hope that the adaptation, especially over two stages, is still sufficient to aid in longer contexts and multi-turn conversations from the ancestor models, and add some individual style while retaining a fair amount of their capability.
|
44 |
|
45 |
This model's refusals are ... not nonexistent, but certainly don't rely on them.
|
|
|
39 |
From there, each of the four threads was separately task-tuned on 2 datasets each.
|
40 |
Various methods of combining those via merge were tested, with this one scoring highest on EQ-Bench as an indicator.
|
41 |
|
42 |
+
My understanding of the Model Stock merge method is that it reduces task adaptation to a significant degree, but also significantly limits forgetting caused by training.
|
43 |
I have hope that the adaptation, especially over two stages, is still sufficient to aid in longer contexts and multi-turn conversations from the ancestor models, and add some individual style while retaining a fair amount of their capability.
|
44 |
|
45 |
This model's refusals are ... not nonexistent, but certainly don't rely on them.
|