Triangle104 commited on
Commit
01a0141
·
verified ·
1 Parent(s): f74a46e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +45 -0
README.md CHANGED
@@ -15,6 +15,51 @@ tags:
15
  This model was converted to GGUF format from [`ArliAI/DS-R1-Qwen3-8B-ArliAI-RpR-v4-Small`](https://huggingface.co/ArliAI/DS-R1-Qwen3-8B-ArliAI-RpR-v4-Small) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
16
  Refer to the [original model card](https://huggingface.co/ArliAI/DS-R1-Qwen3-8B-ArliAI-RpR-v4-Small) for more details on the model.
17
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  ## Use with llama.cpp
19
  Install llama.cpp through brew (works on Mac and Linux)
20
 
 
15
  This model was converted to GGUF format from [`ArliAI/DS-R1-Qwen3-8B-ArliAI-RpR-v4-Small`](https://huggingface.co/ArliAI/DS-R1-Qwen3-8B-ArliAI-RpR-v4-Small) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
16
  Refer to the [original model card](https://huggingface.co/ArliAI/DS-R1-Qwen3-8B-ArliAI-RpR-v4-Small) for more details on the model.
17
 
18
+ ---
19
+ RpR (RolePlay with Reasoning) is a new series of models from ArliAI. This series builds directly upon the successful dataset curation methodology and training methods developed for the RPMax series.
20
+
21
+
22
+ RpR models use the same curated, deduplicated RP and creative writing
23
+ dataset used for RPMax, with a focus on variety to ensure high
24
+ creativity and minimize cross-context repetition. Users familiar with
25
+ RPMax will recognize the unique, non-repetitive writing style unlike
26
+ other finetuned-for-RP models.
27
+
28
+
29
+ With the release of QwQ as the first high performing open-source
30
+ reasoning model that can be easily trained, it was clear that the
31
+ available instruct and creative writing reasoning datasets contains only
32
+ one response per example. This is type of single response dataset used
33
+ for training reasoning models causes degraded output quality in long
34
+ multi-turn chats. Which is why Arli AI decided to create a real RP model
35
+ capable of long multi-turn chat with reasoning.
36
+
37
+
38
+ In order to create RpR, we first had to actually create the reasoning
39
+ RP dataset by re-processing our existing known-good RPMax dataset into a
40
+ reasoning dataset. This was possible by using the base QwQ Instruct
41
+ model itself to create the reasoning process for every turn in the RPMax
42
+ dataset conversation examples, which is then further refined in order
43
+ to make sure the reasoning is in-line with the actual response examples
44
+ from the dataset.
45
+
46
+
47
+ Another important thing to get right is to make sure the model is
48
+ trained on examples that present reasoning blocks in the same way as it
49
+ encounters it during inference. Which is, never seeing the reasoning
50
+ blocks in it's context. In order to do this, the training run was
51
+ completed using axolotl with manual template-free segments dataset in
52
+ order to make sure that the model is never trained to see the reasoning
53
+ block in the context. Just like how the model will be used during
54
+ inference time.
55
+
56
+
57
+ The result of training on this dataset with this method are
58
+ consistently coherent and interesting outputs even in long multi-turn RP
59
+ chats. This is as far as we know the first true correctly-trained
60
+ reasoning model trained for RP and creative writing.
61
+
62
+ ---
63
  ## Use with llama.cpp
64
  Install llama.cpp through brew (works on Mac and Linux)
65