Update README.md
Browse files
README.md
CHANGED
@@ -37,28 +37,15 @@ Unfortunately it has also been shown that LLM's frequently struggle to attend to
|
|
37 |
Here I explore whether training on long sequences that have clear conceptual dependencies residing in the middle of the context helps attenuate the difficulties in attending to middle-context tokens. When/if I have time, I hope to perform a more rigorous assessment of the peformance with respect to this specific issue.
|
38 |
|
39 |
## Relative Performance (perplexity)
|
40 |
-
|
41 |
-
|
|
42 |
-
| TheBloke/airoboros-13B-gpt4-1-4-GPTQ | 512 | **7.42** |
|
43 |
-
| TheBloke/airoboros-13B-gpt4-1-4-SuperHOT-8K-GPTQ | 512 | 8.86 |
|
44 |
-
| **bhenrym14/airoboros-13b-gpt4-1.4.1-PI-8192-GPTQ** | 512 | 7.94 |
|
45 |
-
| ---------------------------------------------------- | ----------- | ---------- |
|
46 |
-
| TheBloke/airoboros-13B-gpt4-1-4-GPTQ | 2048 | **5.02** |
|
47 |
-
| TheBloke/airoboros-13B-gpt4-1-4-SuperHOT-8K-GPTQ | 2048 | 5.98 |
|
48 |
-
| **bhenrym14/airoboros-13b-gpt4-1.4.1-PI-8192-GPTQ** | 2048 | 5.28 |
|
49 |
-
| ---------------------------------------------------- | ----------- | ---------- |
|
50 |
-
| TheBloke/airoboros-13B-gpt4-1-4-GPTQ | 4096 | 9848.0 |
|
51 |
-
| TheBloke/airoboros-13B-gpt4-1-4-SuperHOT-8K-GPTQ | 4096 | 5.80 |
|
52 |
-
| **bhenrym14/airoboros-13b-gpt4-1.4.1-PI-8192-GPTQ** | 4096 | **5.15** |
|
53 |
-
|
54 |
-
| Context (tokens) | airophin-13b-pntk-16k-fp16| bhenrym14/airoboros-13b-gpt4-1.4.1-PI-8192-GPTQ |bhenrym14/airoboros-33b-gpt4-1.4.1-lxctx-PI-16384-fp16 | TheBloke/airoboros-33B-gpt4-1-4-SuperHOT-8K-GPTQ | jondurbin/airoboros-33B-gpt4-1.4-GPTQ |
|
55 |
| ---| ------- | -----| ------ | --- | --- |
|
56 |
-
| 512 | 7.62 | | 7.90 |
|
57 |
-
| 1024 | 6.20 |
|
58 |
-
| 2048 | 5.38 | | 5.23
|
59 |
-
| 4096 | 5.08 | | **4.91** |
|
60 |
-
| 8192 | 4.90 |
|
61 |
-
| 12000 | 4.82 |
|
62 |
|
63 |
- This model is competitive with the Llama-1 33b variants, outperforming the best long context model for short sequences.
|
64 |
- Not presented here, but this model outperforms the base llama-2-13b on MMLU-fs with a score of 54.9. While not an appreciable improvement, the fact there wasn't a performance regression despite the context extension is notable.
|
|
|
37 |
Here I explore whether training on long sequences that have clear conceptual dependencies residing in the middle of the context helps attenuate the difficulties in attending to middle-context tokens. When/if I have time, I hope to perform a more rigorous assessment of the peformance with respect to this specific issue.
|
38 |
|
39 |
## Relative Performance (perplexity)
|
40 |
+
|
41 |
+
| Context (tokens) | airophin-13b-pntk-16k-fp16| bhenrym14/airoboros-13b-gpt4-1.4.1-PI-8192-GPTQ |bhenrym14/airoboros-33b-gpt4-1.4.1-lxctx-PI-16384-fp16 | jondurbin/airoboros-33B-gpt4-1.4-GPTQ |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
42 |
| ---| ------- | -----| ------ | --- | --- |
|
43 |
+
| 512 | 7.62 | 8.24 | 7.90 | **6.36** |
|
44 |
+
| 1024 | 6.20 | 6.71 | 6.17 | **5.12** |
|
45 |
+
| 2048 | 5.38 | 5.87 | 5.23 | **4.43** |
|
46 |
+
| 4096 | 5.08 | 5.50 | **4.91** | 54.5 |
|
47 |
+
| 8192 | 4.90 | 5.32 | -- | -- |
|
48 |
+
| 12000 | 4.82 | 56.1 | -- | -- |
|
49 |
|
50 |
- This model is competitive with the Llama-1 33b variants, outperforming the best long context model for short sequences.
|
51 |
- Not presented here, but this model outperforms the base llama-2-13b on MMLU-fs with a score of 54.9. While not an appreciable improvement, the fact there wasn't a performance regression despite the context extension is notable.
|