tdrussell commited on
Commit
19be2a7
1 Parent(s): 78b4c56

Add evaluation metrics image

Browse files
Files changed (1) hide show
  1. README.md +3 -0
README.md CHANGED
@@ -9,6 +9,9 @@ Because this was trained on Instruct, you can use the normal Instruct chat forma
9
  Trained on 4 4090s using [qlora-pipe](https://github.com/tdrussell/qlora-pipe).
10
  Dataset consists of about 800 books in the fiction genre, totaling 570 MB of raw text.
11
  Rank 64 QLoRA trained at 8192 sequence length.
 
 
 
12
 
13
  ## Why no 8B?
14
  I tried multiple times to train this on Llama 3 8B Instruct, using a variety of hyperparameters. It never worked well. The model took a huge hit to intelligence every time, to the point of being unusable. 70B fared much better. I don't know why, maybe 8B is just too small for this type of technique, and loses too much of the instruction-tuned smarts.
 
9
  Trained on 4 4090s using [qlora-pipe](https://github.com/tdrussell/qlora-pipe).
10
  Dataset consists of about 800 books in the fiction genre, totaling 570 MB of raw text.
11
  Rank 64 QLoRA trained at 8192 sequence length.
12
+ ### Evaluation metrics
13
+
14
+ <img src="https://i.imgur.com/sCMjix4.png" width="800" />
15
 
16
  ## Why no 8B?
17
  I tried multiple times to train this on Llama 3 8B Instruct, using a variety of hyperparameters. It never worked well. The model took a huge hit to intelligence every time, to the point of being unusable. 70B fared much better. I don't know why, maybe 8B is just too small for this type of technique, and loses too much of the instruction-tuned smarts.