Update README.md
Browse files
README.md
CHANGED
@@ -42,6 +42,16 @@ extra_gated_prompt: '**Usage Warnings**
|
|
42 |
This model was converted to GGUF format from [`huihui-ai/QwenLong-L1-32B-abliterated`](https://huggingface.co/huihui-ai/QwenLong-L1-32B-abliterated) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
|
43 |
Refer to the [original model card](https://huggingface.co/huihui-ai/QwenLong-L1-32B-abliterated) for more details on the model.
|
44 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
45 |
## Use with llama.cpp
|
46 |
Install llama.cpp through brew (works on Mac and Linux)
|
47 |
|
|
|
42 |
This model was converted to GGUF format from [`huihui-ai/QwenLong-L1-32B-abliterated`](https://huggingface.co/huihui-ai/QwenLong-L1-32B-abliterated) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
|
43 |
Refer to the [original model card](https://huggingface.co/huihui-ai/QwenLong-L1-32B-abliterated) for more details on the model.
|
44 |
|
45 |
+
## ♾️ Processing Long Documents
|
46 |
+
|
47 |
+
For input where the total length (including both input and output) significantly exceeds 32,768 tokens, we recommend using RoPE scaling techniques to handle long texts effectively. We have validated the model's performance on context lengths of up to 131,072 tokens using the [YaRN](https://arxiv.org/abs/2309.00071) method.
|
48 |
+
|
49 |
+
For `llama-server` from `llama.cpp`, you can use
|
50 |
+
```shell
|
51 |
+
llama-server ... --rope-scaling yarn --rope-scale 4 --yarn-orig-ctx 32768
|
52 |
+
```
|
53 |
+
|
54 |
+
|
55 |
## Use with llama.cpp
|
56 |
Install llama.cpp through brew (works on Mac and Linux)
|
57 |
|