Update README.md
Browse files
README.md
CHANGED
@@ -38,13 +38,13 @@ Here I explore whether training on long sequences that have clear conceptual dep
|
|
38 |
|
39 |
## Relative Performance (perplexity)
|
40 |
|
41 |
-
| Context (tokens) | airophin-13b-pntk-16k-fp16| bhenrym14/airoboros-13b-gpt4-1.4.1-PI-8192-fp16 |bhenrym14/airoboros-33b-gpt4-1.4.1-lxctx-PI-16384-fp16 | jondurbin/airoboros-
|
42 |
| ---| ----- | -----| ------| --- |
|
43 |
-
| 512 | 7.62 | 8.24 | 7.90 | **
|
44 |
-
| 1024 | 6.20 | 6.71 | 6.17 | **5.
|
45 |
-
| 2048 | 5.38 | 5.87 | 5.23 | **
|
46 |
-
| 4096 | 5.08 | 5.50 | **4.91** |
|
47 |
-
| 8192 | 4.90 | 5.32 | -- |
|
48 |
| 12000 | 4.82 | 56.1 | -- | -- |
|
49 |
|
50 |
- This model is competitive with the Llama-1 33b variants, outperforming the best long context model for short sequences.
|
|
|
38 |
|
39 |
## Relative Performance (perplexity)
|
40 |
|
41 |
+
| Context (tokens) | airophin-13b-pntk-16k-fp16| bhenrym14/airoboros-13b-gpt4-1.4.1-PI-8192-fp16 |bhenrym14/airoboros-33b-gpt4-1.4.1-lxctx-PI-16384-fp16 | jondurbin/airoboros-l2-13b-gpt4-1.4.1 |
|
42 |
| ---| ----- | -----| ------| --- |
|
43 |
+
| 512 | 7.62 | 8.24 | 7.90 | **7.23** |
|
44 |
+
| 1024 | 6.20 | 6.71 | 6.17 | **5.85** |
|
45 |
+
| 2048 | 5.38 | 5.87 | 5.23 | **5.07** |
|
46 |
+
| 4096 | 5.08 | 5.50 | **4.91** | 4.77 |
|
47 |
+
| 8192 | 4.90 | 5.32 | -- | 57.1 |
|
48 |
| 12000 | 4.82 | 56.1 | -- | -- |
|
49 |
|
50 |
- This model is competitive with the Llama-1 33b variants, outperforming the best long context model for short sequences.
|