RoadToNowhere commited on
Commit
f6efc5b
·
verified ·
1 Parent(s): 63615b9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -0
README.md CHANGED
@@ -42,6 +42,16 @@ extra_gated_prompt: '**Usage Warnings**
42
  This model was converted to GGUF format from [`huihui-ai/QwenLong-L1-32B-abliterated`](https://huggingface.co/huihui-ai/QwenLong-L1-32B-abliterated) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
43
  Refer to the [original model card](https://huggingface.co/huihui-ai/QwenLong-L1-32B-abliterated) for more details on the model.
44
 
 
 
 
 
 
 
 
 
 
 
45
  ## Use with llama.cpp
46
  Install llama.cpp through brew (works on Mac and Linux)
47
 
 
42
  This model was converted to GGUF format from [`huihui-ai/QwenLong-L1-32B-abliterated`](https://huggingface.co/huihui-ai/QwenLong-L1-32B-abliterated) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
43
  Refer to the [original model card](https://huggingface.co/huihui-ai/QwenLong-L1-32B-abliterated) for more details on the model.
44
 
45
+ ## ♾️ Processing Long Documents
46
+
47
+ For input where the total length (including both input and output) significantly exceeds 32,768 tokens, we recommend using RoPE scaling techniques to handle long texts effectively. We have validated the model's performance on context lengths of up to 131,072 tokens using the [YaRN](https://arxiv.org/abs/2309.00071) method.
48
+
49
+ For `llama-server` from `llama.cpp`, you can use
50
+ ```shell
51
+ llama-server ... --rope-scaling yarn --rope-scale 4 --yarn-orig-ctx 32768
52
+ ```
53
+
54
+
55
  ## Use with llama.cpp
56
  Install llama.cpp through brew (works on Mac and Linux)
57