Update README.md
Browse files
README.md
CHANGED
@@ -60,7 +60,10 @@ Here I explore whether training on long sequences that have clear conceptual dep
|
|
60 |
| 8192 | 4.90 | | -- | -- | -- |
|
61 |
| 12000 | 4.82 | | -- | -- | -- |
|
62 |
|
63 |
-
|
|
|
|
|
|
|
64 |
|
65 |
## Quantization:
|
66 |
|
|
|
60 |
| 8192 | 4.90 | | -- | -- | -- |
|
61 |
| 12000 | 4.82 | | -- | -- | -- |
|
62 |
|
63 |
+
- This model is competitive with the Llama-1 33b variants, outperforming the best long context model for short sequences.
|
64 |
+
- Not presented here, but this model outperforms the base llama-2-13b on MMLU-fs with a score of 54.9. While not an appreciable improvement, the fact there wasn't a performance regression despite the context extension is notable.
|
65 |
+
- Perplexity continues to decline to 12000 tokens, the longest context length I tested due to VRAM constraints.
|
66 |
+
-
|
67 |
|
68 |
## Quantization:
|
69 |
|