bhenrym14
/

airophin-13b-pntk-16k-GPTQ

Text Generation

Transformers

llama

Inference Endpoints

Model card Files Files and versions Community

bhenrym14 commited on Jul 25, 2023

Commit

8ab4145

•

1 Parent(s): f38c375

Update README.md

Browse files

Files changed (1) hide show

README.md +8 -21

README.md CHANGED Viewed

@@ -37,28 +37,15 @@ Unfortunately it has also been shown that LLM's frequently struggle to attend to
 Here I explore whether training on long sequences that have clear conceptual dependencies residing in the middle of the context helps attenuate the difficulties in attending to middle-context tokens. When/if I have time, I hope to perform a more rigorous assessment of the peformance with respect to this specific issue.
 ## Relative Performance (perplexity)
-| Model                                                | Context (tokens)     | Perplexity |
-| ---------------------------------------------------- | ----------- | ---------- |
-| TheBloke/airoboros-13B-gpt4-1-4-GPTQ     | 512        |    **7.42**    |
-| TheBloke/airoboros-13B-gpt4-1-4-SuperHOT-8K-GPTQ     | 512        |    8.86    |
-| **bhenrym14/airoboros-13b-gpt4-1.4.1-PI-8192-GPTQ**    | 512    | 7.94   |
-| ---------------------------------------------------- | ----------- | ---------- |
-| TheBloke/airoboros-13B-gpt4-1-4-GPTQ     | 2048        |    **5.02**    |
-| TheBloke/airoboros-13B-gpt4-1-4-SuperHOT-8K-GPTQ     | 2048        |    5.98    |
-| **bhenrym14/airoboros-13b-gpt4-1.4.1-PI-8192-GPTQ**    | 2048   | 5.28   |
-| ---------------------------------------------------- | ----------- | ---------- |
-| TheBloke/airoboros-13B-gpt4-1-4-GPTQ     | 4096        |    9848.0    |
-| TheBloke/airoboros-13B-gpt4-1-4-SuperHOT-8K-GPTQ     | 4096        |    5.80    |
-| **bhenrym14/airoboros-13b-gpt4-1.4.1-PI-8192-GPTQ**    | 4096   | **5.15**   |
-| Context (tokens)  | airophin-13b-pntk-16k-fp16| bhenrym14/airoboros-13b-gpt4-1.4.1-PI-8192-GPTQ |bhenrym14/airoboros-33b-gpt4-1.4.1-lxctx-PI-16384-fp16  | TheBloke/airoboros-33B-gpt4-1-4-SuperHOT-8K-GPTQ  | jondurbin/airoboros-33B-gpt4-1.4-GPTQ |
 | ---| ------- | -----| ------ | --- | --- |
-| 512 | 7.62 | | 7.90 | 8.24 | **6.36** |
-| 1024 | 6.20 | | 6.17 | 8.06 | **5.12**  |
-| 2048 | 5.38 | | 5.23  | 7.02 | **4.43** |
-| 4096 | 5.08 | | **4.91** | 6.56 | 54.5 |
-| 8192 | 4.90 | | -- | -- | -- |
-| 12000 | 4.82 | | -- | -- | -- |
 - This model is competitive with the Llama-1 33b variants, outperforming the best long context model for short sequences.
 - Not presented here, but this model outperforms the base llama-2-13b on MMLU-fs with a score of 54.9. While not an appreciable improvement, the fact there wasn't a performance regression despite the context extension is notable.

 Here I explore whether training on long sequences that have clear conceptual dependencies residing in the middle of the context helps attenuate the difficulties in attending to middle-context tokens. When/if I have time, I hope to perform a more rigorous assessment of the peformance with respect to this specific issue.
 ## Relative Performance (perplexity)
+| Context (tokens)  | airophin-13b-pntk-16k-fp16| bhenrym14/airoboros-13b-gpt4-1.4.1-PI-8192-GPTQ |bhenrym14/airoboros-33b-gpt4-1.4.1-lxctx-PI-16384-fp16  | jondurbin/airoboros-33B-gpt4-1.4-GPTQ |
 | ---| ------- | -----| ------ | --- | --- |
+| 512 | 7.62 | 8.24 | 7.90 | **6.36** |
+| 1024 | 6.20 | 6.71 | 6.17 | **5.12**  |
+| 2048 | 5.38 | 5.87 | 5.23 | **4.43** |
+| 4096 | 5.08 | 5.50 | **4.91** | 54.5 |
+| 8192 | 4.90 | 5.32 | -- | -- |
+| 12000 | 4.82 | 56.1 | -- | -- |
 - This model is competitive with the Llama-1 33b variants, outperforming the best long context model for short sequences.
 - Not presented here, but this model outperforms the base llama-2-13b on MMLU-fs with a score of 54.9. While not an appreciable improvement, the fact there wasn't a performance regression despite the context extension is notable.