Update README.md
Browse files
README.md
CHANGED
@@ -9,9 +9,9 @@ tags:
|
|
9 |
|
10 |
# Mistral-7B-Instruct-v0.2-expanded
|
11 |
|
12 |
-
This method employs mergekit's passthrough method to expand blocks within the "mistralai/Mistral-7B-Instruct-v0.2" model. For every
|
13 |
a new layer is added, with the `o_proj` and `down_proj` parameters of these added layers initialized to zero, mirroring the approach used in LLaMA Pro.
|
14 |
-
It's important to note that this configuration has not undergone fine-tuning. Therefore, when fine-tuning, ensure that only every
|
15 |
while all other layers remain frozen.
|
16 |
|
17 |
## 🧩 Configuration
|
|
|
9 |
|
10 |
# Mistral-7B-Instruct-v0.2-expanded
|
11 |
|
12 |
+
This method employs mergekit's passthrough method to expand blocks within the "mistralai/Mistral-7B-Instruct-v0.2" model. For every 5th layer,
|
13 |
a new layer is added, with the `o_proj` and `down_proj` parameters of these added layers initialized to zero, mirroring the approach used in LLaMA Pro.
|
14 |
+
It's important to note that this configuration has not undergone fine-tuning. Therefore, when fine-tuning, ensure that only every 5th layer is trainable,
|
15 |
while all other layers remain frozen.
|
16 |
|
17 |
## 🧩 Configuration
|