RoadToNowhere
/

QwenLong-L1-32B-abliterated-Q4_K_M-GGUF

large-reasoning-model

Model card Files Files and versions

RoadToNowhere commited on May 31

Commit

f6efc5b

·

verified ·

1 Parent(s): 63615b9

Update README.md

Files changed (1) hide show

README.md +10 -0

README.md CHANGED Viewed

@@ -42,6 +42,16 @@ extra_gated_prompt: '**Usage Warnings**
 This model was converted to GGUF format from [`huihui-ai/QwenLong-L1-32B-abliterated`](https://huggingface.co/huihui-ai/QwenLong-L1-32B-abliterated) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
 Refer to the [original model card](https://huggingface.co/huihui-ai/QwenLong-L1-32B-abliterated) for more details on the model.
 ## Use with llama.cpp
 Install llama.cpp through brew (works on Mac and Linux)

 This model was converted to GGUF format from [`huihui-ai/QwenLong-L1-32B-abliterated`](https://huggingface.co/huihui-ai/QwenLong-L1-32B-abliterated) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
 Refer to the [original model card](https://huggingface.co/huihui-ai/QwenLong-L1-32B-abliterated) for more details on the model.
+## ♾️ Processing Long Documents
+For input where the total length (including both input and output) significantly exceeds 32,768 tokens, we recommend using RoPE scaling techniques to handle long texts effectively. We have validated the model's performance on context lengths of up to 131,072 tokens using the [YaRN](https://arxiv.org/abs/2309.00071) method.
+For `llama-server` from `llama.cpp`, you can use
+  ```shell
+  llama-server ... --rope-scaling yarn --rope-scale 4 --yarn-orig-ctx 32768
+  ```
 ## Use with llama.cpp
 Install llama.cpp through brew (works on Mac and Linux)