brucethemoose commited on
Commit
36fd7b4
1 Parent(s): 2f3eb91

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -11
README.md CHANGED
@@ -17,12 +17,8 @@ tags:
17
  https://github.com/yule-BUAA/MergeLM
18
 
19
  https://github.com/cg123/mergekit/tree/dare'
20
- ***
21
 
22
 
23
- 24GB GPUs can run Yi-34B-200K models at **45K-75K context** with exllamav2. I go into more detail in this [post](https://old.reddit.com/r/LocalLLaMA/comments/1896igc/how_i_run_34b_models_at_75k_context_on_24gb_fast/), and recommend exl2 quantizations on data similar to the desired task, such as these targeted at story writing: [4.0bpw](https://huggingface.co/brucethemoose/CapyTessBorosYi-34B-200K-DARE-Ties-exl2-4bpw-fiction) / [3.1bpw](https://huggingface.co/brucethemoose/CapyTessBorosYi-34B-200K-DARE-Ties-exl2-3.1bpw-fiction)
24
- ***
25
-
26
  Merged with the following config, and the tokenizer from chargoddard's Yi-Llama:
27
  ```
28
  models:
@@ -66,13 +62,7 @@ Being a Yi model, try disabling the BOS token and/or running a lower temperature
66
  Sometimes the model "spells out" the stop token as `</s>` like Capybara, so you may need to add `</s>` as an additional stopping condition. It also might respond to the llama-2 chat format.
67
 
68
  ***
69
-
70
- I run Yi models in exui for maximum context size on 24GB GPUs. You can fit about 47K context on an empty GPU at 4bpw, and exui's speed really helps at high context:
71
-
72
- https://github.com/turboderp/exui
73
-
74
- https://huggingface.co/brucethemoose/CapyTessBorosYi-34B-200K-DARE-Ties-exl2-4bpw-fiction
75
-
76
  ***
77
 
78
  Credits:
 
17
  https://github.com/yule-BUAA/MergeLM
18
 
19
  https://github.com/cg123/mergekit/tree/dare'
 
20
 
21
 
 
 
 
22
  Merged with the following config, and the tokenizer from chargoddard's Yi-Llama:
23
  ```
24
  models:
 
62
  Sometimes the model "spells out" the stop token as `</s>` like Capybara, so you may need to add `</s>` as an additional stopping condition. It also might respond to the llama-2 chat format.
63
 
64
  ***
65
+ 24GB GPUs can run Yi-34B-200K models at **45K-75K context** with exllamav2. I go into more detail in this [post](https://old.reddit.com/r/LocalLLaMA/comments/1896igc/how_i_run_34b_models_at_75k_context_on_24gb_fast/), and recommend exl2 quantizations on data similar to the desired task, such as these targeted at story writing: [4.0bpw](https://huggingface.co/brucethemoose/CapyTessBorosYi-34B-200K-DARE-Ties-exl2-4bpw-fiction) / [3.1bpw](https://huggingface.co/brucethemoose/CapyTessBorosYi-34B-200K-DARE-Ties-exl2-3.1bpw-fiction)
 
 
 
 
 
 
66
  ***
67
 
68
  Credits: