Triangle104
/

DS-R1-Qwen3-8B-ArliAI-RpR-v4-Small-Q4_K_S-GGUF

@@ -15,51 +15,6 @@ tags:
 This model was converted to GGUF format from [`ArliAI/DS-R1-Qwen3-8B-ArliAI-RpR-v4-Small`](https://huggingface.co/ArliAI/DS-R1-Qwen3-8B-ArliAI-RpR-v4-Small) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
 Refer to the [original model card](https://huggingface.co/ArliAI/DS-R1-Qwen3-8B-ArliAI-RpR-v4-Small) for more details on the model.
----
-RpR (RolePlay with Reasoning) is a new series of models from ArliAI. This series builds directly upon the successful dataset curation methodology and training methods developed for the RPMax series.
-RpR models use the same curated, deduplicated RP and creative writing
- dataset used for RPMax, with a focus on variety to ensure high
-creativity and minimize cross-context repetition. Users familiar with
-RPMax will recognize the unique, non-repetitive writing style unlike
-other finetuned-for-RP models.
-With the release of QwQ as the first high performing open-source
-reasoning model that can be easily trained, it was clear that the
-available instruct and creative writing reasoning datasets contains only
- one response per example. This is type of single response dataset used
-for training reasoning models causes degraded output quality in long
-multi-turn chats. Which is why Arli AI decided to create a real RP model
- capable of long multi-turn chat with reasoning.
-In order to create RpR, we first had to actually create the reasoning
- RP dataset by re-processing our existing known-good RPMax dataset into a
- reasoning dataset. This was possible by using the base QwQ Instruct
-model itself to create the reasoning process for every turn in the RPMax
- dataset conversation examples, which is then further refined in order
-to make sure the reasoning is in-line with the actual response examples
-from the dataset.
-Another important thing to get right is to make sure the model is
-trained on examples that present reasoning blocks in the same way as it
-encounters it during inference. Which is, never seeing the reasoning
-blocks in it's context. In order to do this, the training run was
-completed using axolotl with manual template-free segments dataset in
-order to make sure that the model is never trained to see the reasoning
-block in the context. Just like how the model will be used during
-inference time.
-The result of training on this dataset with this method are
-consistently coherent and interesting outputs even in long multi-turn RP
- chats. This is as far as we know the first true correctly-trained
-reasoning model trained for RP and creative writing.
----
 ## Use with llama.cpp
 Install llama.cpp through brew (works on Mac and Linux)

 This model was converted to GGUF format from [`ArliAI/DS-R1-Qwen3-8B-ArliAI-RpR-v4-Small`](https://huggingface.co/ArliAI/DS-R1-Qwen3-8B-ArliAI-RpR-v4-Small) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
 Refer to the [original model card](https://huggingface.co/ArliAI/DS-R1-Qwen3-8B-ArliAI-RpR-v4-Small) for more details on the model.
 ## Use with llama.cpp
 Install llama.cpp through brew (works on Mac and Linux)