Text Generation
Transformers
llama
Inference Endpoints
bhenrym14 commited on
Commit
f38c375
1 Parent(s): f04efda

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -1
README.md CHANGED
@@ -60,7 +60,10 @@ Here I explore whether training on long sequences that have clear conceptual dep
60
  | 8192 | 4.90 | | -- | -- | -- |
61
  | 12000 | 4.82 | | -- | -- | -- |
62
 
63
-
 
 
 
64
 
65
  ## Quantization:
66
 
 
60
  | 8192 | 4.90 | | -- | -- | -- |
61
  | 12000 | 4.82 | | -- | -- | -- |
62
 
63
+ - This model is competitive with the Llama-1 33b variants, outperforming the best long context model for short sequences.
64
+ - Not presented here, but this model outperforms the base llama-2-13b on MMLU-fs with a score of 54.9. While not an appreciable improvement, the fact there wasn't a performance regression despite the context extension is notable.
65
+ - Perplexity continues to decline to 12000 tokens, the longest context length I tested due to VRAM constraints.
66
+ -
67
 
68
  ## Quantization:
69