Jason Cheng commited on
Commit
1c83592
·
verified ·
1 Parent(s): b657be7

Fix typos add image

Browse files
Files changed (1) hide show
  1. README.md +91 -87
README.md CHANGED
@@ -1,88 +1,92 @@
1
- ---
2
- base_model:
3
- - anthracite-org/magnum-v3-9b-customgemma2
4
- - nbeerbower/gemma2-gutenberg-9B
5
- - grimjim/Magnolia-v1-Gemma2-8k-9B
6
- - UCLA-AGI/Gemma-2-9B-It-SPPO-Iter3
7
- - BeaverLegacy/Smegmma-Deluxe-9B-v1
8
- - ifable/gemma-2-Ifable-9B
9
- library_name: transformers
10
- tags:
11
- - mergekit
12
- - merge
13
-
14
- ---
15
- # temp
16
-
17
- This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
18
-
19
- ## Merge Details
20
- ### Merge Method
21
-
22
- This model was merged using the SLERP method to create an intermediate model. The [Model Stock](https://arxiv.org/abs/2403.19522) merge method used the SLERP model as a base to mix in more models.
23
-
24
- The idea was to make a nice and smart base model and add in a few pinches of spice.
25
-
26
- For some reason it wouldn't let me use any other merge method- it gave me ModelReference errors about my
27
- intermediary model for every method except Model Stock for some reason. I'll see if I can fix it and
28
- upload my intended task-arithmetic version as a v2.
29
-
30
- This is the only one of my like 700 merges that I think uses something novel/interesting
31
- enough in its creation to merit an upload.
32
-
33
- ### Models Merged
34
-
35
- The following models were included in the merge:
36
- * [anthracite-org/magnum-v3-9b-customgemma2](https://huggingface.co/anthracite-org/magnum-v3-9b-customgemma2)
37
- * [nbeerbower/gemma2-gutenberg-9B](https://huggingface.co/nbeerbower/gemma2-gutenberg-9B)
38
-
39
- * [grimjim/Magnolia-v1-Gemma2-8k-9B](https://huggingface.co/grimjim/Magnolia-v1-Gemma2-8k-9B)
40
- * [UCLA-AGI/Gemma-2-9B-It-SPPO-Iter3](https://huggingface.co/UCLA-AGI/Gemma-2-9B-It-SPPO-Iter3)
41
- * [BeaverLegacy/Smegmma-Deluxe-9B-v1](https://huggingface.co/BeaverLegacy/Smegmma-Deluxe-9B-v1)
42
- * [ifable/gemma-2-Ifable-9B](https://huggingface.co/ifable/gemma-2-Ifable-9B)
43
-
44
- ### Configuration
45
-
46
- The following YAML configuration was used to produce this model:
47
-
48
- ```yaml
49
- # THIS YAML CONFIGURATION WAS USED TO CREATE THE INTERMEDIARY MODEL.
50
- # slices:
51
- # - sources:
52
- # - model: anthracite-org/magnum-v3-9b-customgemma2
53
- # layer_range: [0, 42]
54
- # - model: nbeerbower/gemma2-gutenberg-9B
55
- # layer_range: [0, 42]
56
- # merge_method: slerp
57
- # base_model: nbeerbower/gemma2-gutenberg-9B
58
- # parameters:
59
- # t:
60
- # - filter: self_attn
61
- # value: [0.2, 0.5, 0.4, 0.7, 1]
62
- # - filter: mlp
63
- # value: [1, 0.5, 0.3, 0.4, 0.2]
64
- # - value: 0.5
65
- # dtype: float16
66
-
67
- # THIS YAML CONFIGURATION WAS USED TO CREATE ASTER. The E: model is the intermediate
68
- # model created in the previous config.
69
- models:
70
- - model: E:/models/mergekit/output/intermediate/
71
- - model: BeaverLegacy/Smegmma-Deluxe-9B-v1
72
- parameters:
73
- weight: 0.3
74
- - model: ifable/gemma-2-Ifable-9B
75
- parameters:
76
- weight: 0.3
77
- - model: UCLA-AGI/Gemma-2-9B-It-SPPO-Iter3
78
- parameters:
79
- weight: 0.15
80
- - model: grimjim/Magnolia-v1-Gemma2-8k-9B
81
- parameters:
82
- weight: 0.25
83
- merge_method: model_stock
84
- base_model: E:/models/mergekit/output/intermediate/
85
- dtype: float16
86
- ```
87
-
 
 
 
 
88
  Alright, now back to smashing models together and seeing what happens...
 
1
+ ---
2
+ base_model:
3
+ - anthracite-org/magnum-v3-9b-customgemma2
4
+ - nbeerbower/gemma2-gutenberg-9B
5
+ - grimjim/Magnolia-v1-Gemma2-8k-9B
6
+ - UCLA-AGI/Gemma-2-9B-It-SPPO-Iter3
7
+ - BeaverLegacy/Smegmma-Deluxe-9B-v1
8
+ - ifable/gemma-2-Ifable-9B
9
+ library_name: transformers
10
+ tags:
11
+ - mergekit
12
+ - merge
13
+
14
+ ---
15
+
16
+ ![Image from google images](https://cdn-lfs-us-1.hf.co/repos/18/09/180999b41a1608d2b6cc42a0390d6443b458650f46f9272f446133b029c7c3e1/da5496d25fce344d4251a87cc4dae68b39c80251ebb51f246e3f3f7e94dcdf8c?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27aster.jpg%3B+filename%3D%22aster.jpg%22%3B&response-content-type=image%2Fjpeg&Expires=1729458155&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcyOTQ1ODE1NX19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy11cy0xLmhmLmNvL3JlcG9zLzE4LzA5LzE4MDk5OWI0MWExNjA4ZDJiNmNjNDJhMDM5MGQ2NDQzYjQ1ODY1MGY0NmY5MjcyZjQ0NjEzM2IwMjljN2MzZTEvZGE1NDk2ZDI1ZmNlMzQ0ZDQyNTFhODdjYzRkYWU2OGIzOWM4MDI1MWViYjUxZjI0NmUzZjNmN2U5NGRjZGY4Yz9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPSomcmVzcG9uc2UtY29udGVudC10eXBlPSoifV19&Signature=LR4Qtaxn8KGxx0sYfP4YqVziM38FcYTAyz0FLB7-PFEG9ffiQVQzNSp0d0sBH1CHEOxWF-A8-yyRxau9hUKnXeChYwS5aud8SzpyiU-F0qR9pDkz2dP5MIeU28BuTb4h1GIa2PumTNAte74G5-komB23YS0V1YRcfXhhd8vphG0HKjq24aJW6f2cDqUQ%7E6i9BsYvgzkXKWGPHwLPr%7EhjuB%7EI4QKbnryJXpCDMda52n3auwgEHPhQb%7E7BETVjhzTATW2eBBZCRoXIrlxH92sJhknA7LKtSgNFhHEke8FZzosfNS12Sk41e39HJB9DC4dc4KPLRZr5Tbdcz88uq1vmqw__&Key-Pair-Id=K24J24Z295AEI9)
17
+
18
+ # temp
19
+
20
+ This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
21
+
22
+ ## Merge Details
23
+ ### Merge Method
24
+
25
+ This model was merged using the SLERP method to create an intermediate model. I used the
26
+ [Model Stock](https://arxiv.org/abs/2403.19522) merge method after, using the SLERP model as a base.
27
+
28
+ The idea was to make a nice and smart base model and add in a few pinches of spice.
29
+
30
+ For some reason it wouldn't let me use any other merge method- it gave me ModelReference errors about my
31
+ intermediary model for every method except Model Stock for some reason. I'll see if I can fix it and
32
+ upload my intended task-arithmetic version as a v2.
33
+
34
+ This is the only one of my like 700 merges that I think uses something novel/interesting
35
+ enough in its creation to merit an upload.
36
+
37
+ ### Models Merged
38
+
39
+ The following models were included in the merge:
40
+ * [anthracite-org/magnum-v3-9b-customgemma2](https://huggingface.co/anthracite-org/magnum-v3-9b-customgemma2)
41
+ * [nbeerbower/gemma2-gutenberg-9B](https://huggingface.co/nbeerbower/gemma2-gutenberg-9B)
42
+
43
+ * [grimjim/Magnolia-v1-Gemma2-8k-9B](https://huggingface.co/grimjim/Magnolia-v1-Gemma2-8k-9B)
44
+ * [UCLA-AGI/Gemma-2-9B-It-SPPO-Iter3](https://huggingface.co/UCLA-AGI/Gemma-2-9B-It-SPPO-Iter3)
45
+ * [BeaverLegacy/Smegmma-Deluxe-9B-v1](https://huggingface.co/BeaverLegacy/Smegmma-Deluxe-9B-v1)
46
+ * [ifable/gemma-2-Ifable-9B](https://huggingface.co/ifable/gemma-2-Ifable-9B)
47
+
48
+ ### Configuration
49
+
50
+ The following YAML configuration was used to produce this model:
51
+
52
+ ```yaml
53
+ # THIS YAML CONFIGURATION WAS USED TO CREATE THE INTERMEDIARY MODEL.
54
+ # slices:
55
+ # - sources:
56
+ # - model: anthracite-org/magnum-v3-9b-customgemma2
57
+ # layer_range: [0, 42]
58
+ # - model: nbeerbower/gemma2-gutenberg-9B
59
+ # layer_range: [0, 42]
60
+ # merge_method: slerp
61
+ # base_model: nbeerbower/gemma2-gutenberg-9B
62
+ # parameters:
63
+ # t:
64
+ # - filter: self_attn
65
+ # value: [0.2, 0.5, 0.4, 0.7, 1]
66
+ # - filter: mlp
67
+ # value: [1, 0.5, 0.3, 0.4, 0.2]
68
+ # - value: 0.5
69
+ # dtype: float16
70
+
71
+ # THIS YAML CONFIGURATION WAS USED TO CREATE ASTER. The E: model is the intermediate
72
+ # model created in the previous config.
73
+ models:
74
+ - model: E:/models/mergekit/output/intermediate/
75
+ - model: BeaverLegacy/Smegmma-Deluxe-9B-v1
76
+ parameters:
77
+ weight: 0.3
78
+ - model: ifable/gemma-2-Ifable-9B
79
+ parameters:
80
+ weight: 0.3
81
+ - model: UCLA-AGI/Gemma-2-9B-It-SPPO-Iter3
82
+ parameters:
83
+ weight: 0.15
84
+ - model: grimjim/Magnolia-v1-Gemma2-8k-9B
85
+ parameters:
86
+ weight: 0.25
87
+ merge_method: model_stock
88
+ base_model: E:/models/mergekit/output/intermediate/
89
+ dtype: float16
90
+ ```
91
+
92
  Alright, now back to smashing models together and seeing what happens...