Update README.md
Browse files
README.md
CHANGED
@@ -56,7 +56,9 @@ MLX:
|
|
56 |
|
57 |
- Format is plain-old ChatML (please note that, unlike regular Qwen 3, you do *not* need to prefill empty think tags for it not to reason -- see below).
|
58 |
|
59 |
-
- Settings used by testers varied, but
|
|
|
|
|
60 |
|
61 |
- The official instruction following version of Qwen3-8B was not used as a base. Instruction-following is trained in post-hoc, and "thinking" traces were not included. __As a result of this, "thinking" will not function.__
|
62 |
|
|
|
56 |
|
57 |
- Format is plain-old ChatML (please note that, unlike regular Qwen 3, you do *not* need to prefill empty think tags for it not to reason -- see below).
|
58 |
|
59 |
+
- Settings used by testers varied, but we generally stayed around 0.9 temperature and 0.1 min p. Do *not* use repetition penalties (DRY included). They break it.
|
60 |
+
|
61 |
+
- Any system prompt can likely be used, but I used the Shingame system prompt (link will be added later i promise)
|
62 |
|
63 |
- The official instruction following version of Qwen3-8B was not used as a base. Instruction-following is trained in post-hoc, and "thinking" traces were not included. __As a result of this, "thinking" will not function.__
|
64 |
|