Update README.md
Browse files
README.md
CHANGED
@@ -87,6 +87,21 @@ print(tokenizer.decode(output_ids, skip_special_tokens=True))
|
|
87 |
>[!TIP]
|
88 |
> We recommend setting `temperature=0.6` and `top_p=0.95` in the sampling parameters.
|
89 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
90 |
### Enabling and Disabling Extended Thinking Mode
|
91 |
|
92 |
We enable extended thinking by default, so the example above generates the output with a reasoning trace. For choosing between enabling, you can provide the `/think` and `/no_think` flags through the system prompt as shown in the snippet below for extended thinking disabled. The code for generating the response with extended thinking would be the same except that the system prompt should have `/think` instead of `/no_think`.
|
|
|
87 |
>[!TIP]
|
88 |
> We recommend setting `temperature=0.6` and `top_p=0.95` in the sampling parameters.
|
89 |
|
90 |
+
### Long context processing
|
91 |
+
|
92 |
+
The current `config.json` is set for context length up to 65,536 tokens. To handle longer inputs (128k or 256k), we utilize YaRN you can change the `max_position_embeddings` and rope_scaling` to:
|
93 |
+
```
|
94 |
+
{
|
95 |
+
...,
|
96 |
+
"rope_scaling": {
|
97 |
+
"factor": 2.0, #2x65536=131 072
|
98 |
+
"original_max_position_embeddings": 65536,
|
99 |
+
"type": "yarn"
|
100 |
+
}
|
101 |
+
}
|
102 |
+
```
|
103 |
+
|
104 |
+
|
105 |
### Enabling and Disabling Extended Thinking Mode
|
106 |
|
107 |
We enable extended thinking by default, so the example above generates the output with a reasoning trace. For choosing between enabling, you can provide the `/think` and `/no_think` flags through the system prompt as shown in the snippet below for extended thinking disabled. The code for generating the response with extended thinking would be the same except that the system prompt should have `/think` instead of `/no_think`.
|