eliebak HF Staff commited on
Commit
b49c771
·
verified ·
1 Parent(s): 90486a3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -0
README.md CHANGED
@@ -87,6 +87,21 @@ print(tokenizer.decode(output_ids, skip_special_tokens=True))
87
  >[!TIP]
88
  > We recommend setting `temperature=0.6` and `top_p=0.95` in the sampling parameters.
89
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
90
  ### Enabling and Disabling Extended Thinking Mode
91
 
92
  We enable extended thinking by default, so the example above generates the output with a reasoning trace. For choosing between enabling, you can provide the `/think` and `/no_think` flags through the system prompt as shown in the snippet below for extended thinking disabled. The code for generating the response with extended thinking would be the same except that the system prompt should have `/think` instead of `/no_think`.
 
87
  >[!TIP]
88
  > We recommend setting `temperature=0.6` and `top_p=0.95` in the sampling parameters.
89
 
90
+ ### Long context processing
91
+
92
+ The current `config.json` is set for context length up to 65,536 tokens. To handle longer inputs (128k or 256k), we utilize YaRN you can change the `max_position_embeddings` and rope_scaling` to:
93
+ ```
94
+ {
95
+ ...,
96
+ "rope_scaling": {
97
+ "factor": 2.0, #2x65536=131 072
98
+ "original_max_position_embeddings": 65536,
99
+ "type": "yarn"
100
+ }
101
+ }
102
+ ```
103
+
104
+
105
  ### Enabling and Disabling Extended Thinking Mode
106
 
107
  We enable extended thinking by default, so the example above generates the output with a reasoning trace. For choosing between enabling, you can provide the `/think` and `/no_think` flags through the system prompt as shown in the snippet below for extended thinking disabled. The code for generating the response with extended thinking would be the same except that the system prompt should have `/think` instead of `/no_think`.