Update README.md
Browse files
README.md
CHANGED
@@ -206,7 +206,7 @@ Data Labeling for Evaluation Datasets:
|
|
206 |
- Hybrid: Human/Synthetic/Automatic
|
207 |
|
208 |
## Evaluation Results
|
209 |
-
|
210 |
|
211 |
> NOTE: Where applicable, a Prompt Template will be provided. While completing benchmarks, please ensure that you are parsing for the correct output format as per the provided prompt in order to reproduce the benchmarks seen below.
|
212 |
|
|
|
206 |
- Hybrid: Human/Synthetic/Automatic
|
207 |
|
208 |
## Evaluation Results
|
209 |
+
These results contain both βReasoning Onβ, and βReasoning Offβ. We recommend using temperature=`0.6`, top_p=`0.95` for βReasoning Onβ mode, and greedy decoding for βReasoning Offβ mode. All evaluations are done with 32k sequence length. We run the benchmarks up to 16 times and average the scores to be more accurate.
|
210 |
|
211 |
> NOTE: Where applicable, a Prompt Template will be provided. While completing benchmarks, please ensure that you are parsing for the correct output format as per the provided prompt in order to reproduce the benchmarks seen below.
|
212 |
|