bhenrym14
/

airophin-13b-pntk-16k-GPTQ

Text Generation

Inference Endpoints

Model card Files Files and versions Community

bhenrym14 commited on Jul 25, 2023

Commit

f38c375

•

1 Parent(s): f04efda

Update README.md

Files changed (1) hide show

README.md +4 -1

README.md CHANGED Viewed

@@ -60,7 +60,10 @@ Here I explore whether training on long sequences that have clear conceptual dep
 | 8192 | 4.90 | | -- | -- | -- |
 | 12000 | 4.82 | | -- | -- | -- |
 ## Quantization:

 | 8192 | 4.90 | | -- | -- | -- |
 | 12000 | 4.82 | | -- | -- | -- |
+- This model is competitive with the Llama-1 33b variants, outperforming the best long context model for short sequences.
+- Not presented here, but this model outperforms the base llama-2-13b on MMLU-fs with a score of 54.9. While not an appreciable improvement, the fact there wasn't a performance regression despite the context extension is notable.
+- Perplexity continues to decline to 12000 tokens, the longest context length I tested due to VRAM constraints.
+-
 ## Quantization: