Triangle104 commited on
Commit
d7b3e1b
·
verified ·
1 Parent(s): d441733

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +51 -0
README.md CHANGED
@@ -13,6 +13,57 @@ thumbnail: https://cdn-uploads.huggingface.co/production/uploads/6625f4a8a8d1362
13
  This model was converted to GGUF format from [`ArliAI/QwQ-32B-ArliAI-RpR-v3`](https://huggingface.co/ArliAI/QwQ-32B-ArliAI-RpR-v3) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
14
  Refer to the [original model card](https://huggingface.co/ArliAI/QwQ-32B-ArliAI-RpR-v3) for more details on the model.
15
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  ## Use with llama.cpp
17
  Install llama.cpp through brew (works on Mac and Linux)
18
 
 
13
  This model was converted to GGUF format from [`ArliAI/QwQ-32B-ArliAI-RpR-v3`](https://huggingface.co/ArliAI/QwQ-32B-ArliAI-RpR-v3) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
14
  Refer to the [original model card](https://huggingface.co/ArliAI/QwQ-32B-ArliAI-RpR-v3) for more details on the model.
15
 
16
+ ---
17
+ RpR (RolePlay with Reasoning) is a new series of models from ArliAI. This series builds directly upon the successful dataset curation methodology and training methods developed for the RPMax series.
18
+
19
+
20
+ RpR models use the same curated, deduplicated RP and creative writing
21
+ dataset used for RPMax, with a focus on variety to ensure high
22
+ creativity and minimize cross-context repetition. Users familiar with
23
+ RPMax will recognize the unique, non-repetitive writing style unlike
24
+ other finetuned-for-RP models.
25
+
26
+
27
+ With the release of QwQ as the first high performing open-source
28
+ reasoning model that can be easily trained, it was clear that the
29
+ available instruct and creative writing reasoning datasets contains only
30
+ one response per example. This is type of single response dataset used
31
+ for training reasoning models causes degraded output quality in long
32
+ multi-turn chats. Which is why Arli AI decided to create a real RP model
33
+ capable of long multi-turn chat with reasoning.
34
+
35
+
36
+ In order to create RpR, we first had to actually create the reasoning
37
+ RP dataset by re-processing our existing known-good RPMax dataset into a
38
+ reasoning dataset. This was possible by using the base QwQ Instruct
39
+ model itself to create the reasoning process for every turn in the RPMax
40
+ dataset conversation examples, which is then further refined in order
41
+ to make sure the reasoning is in-line with the actual response examples
42
+ from the dataset.
43
+
44
+
45
+ Another important thing to get right is to make sure the model is
46
+ trained on examples that present reasoning blocks in the same way as it
47
+ encounters it during inference. Which is, never seeing the reasoning
48
+ blocks in it's context. In order to do this, the training run was
49
+ completed using axolotl with manual template-free segments dataset in
50
+ order to make sure that the model is never trained to see the reasoning
51
+ block in the context. Just like how the model will be used during
52
+ inference time.
53
+
54
+
55
+ The result of training QwQ on this dataset with this method are
56
+ consistently coherent and interesting outputs even in long multi-turn RP
57
+ chats. This is as far as we know the first true correctly-trained
58
+ reasoning model trained for RP and creative writing.
59
+
60
+
61
+ You can access the model at https://arliai.com and we also have a models ranking page at https://www.arliai.com/models-ranking
62
+
63
+
64
+ Ask questions in our new Discord Server https://discord.com/invite/t75KbPgwhk or on our subreddit https://www.reddit.com/r/ArliAI/
65
+
66
+ ---
67
  ## Use with llama.cpp
68
  Install llama.cpp through brew (works on Mac and Linux)
69