Commit
·
1b2c2a8
1
Parent(s):
1c6cd68
Update README.md
Browse files
README.md
CHANGED
@@ -76,7 +76,7 @@ To load this in full-context backends like transformers and vllm, you *must* cha
|
|
76 |
|
77 |
Various densities were tested with perplexity tests and long context prompts. Relatively high densities seem to perform better, contrary to the findings of the Super Mario paper.
|
78 |
|
79 |
-
This particular version is merged with more than the "recommended" max density of 0.5. It seems to result in even better perplexity, but I'm not sure if this translates to better output.
|
80 |
|
81 |
Weights that add up to 1 seems to be optimal.
|
82 |
|
|
|
76 |
|
77 |
Various densities were tested with perplexity tests and long context prompts. Relatively high densities seem to perform better, contrary to the findings of the Super Mario paper.
|
78 |
|
79 |
+
This particular version is merged with more than the "recommended" max density of 0.5. It seems to result in even better perplexity, and a much higher position on the hf leaderboard, but I'm not sure if this translates to better output.
|
80 |
|
81 |
Weights that add up to 1 seems to be optimal.
|
82 |
|