brucethemoose
/

CaPlatTessDolXaBoros-Yi-34B-200K-DARE-Ties-HighDensity

Text Generation

text-generation-inference

Model card Files Files and versions Community

brucethemoose commited on Dec 12, 2023

Commit

1b2c2a8

·

1 Parent(s): 1c6cd68

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -76,7 +76,7 @@ To load this in full-context backends like transformers and vllm, you *must* cha
 Various densities were tested with perplexity tests and long context prompts. Relatively high densities seem to perform better, contrary to the findings of the Super Mario paper.
-This particular version is merged with more than the "recommended" max density of 0.5. It seems to result in even better perplexity, but I'm not sure if this translates to better output.
 Weights that add up to 1 seems to be optimal.

 Various densities were tested with perplexity tests and long context prompts. Relatively high densities seem to perform better, contrary to the findings of the Super Mario paper.
+This particular version is merged with more than the "recommended" max density of 0.5. It seems to result in even better perplexity, and a much higher position on the hf leaderboard, but I'm not sure if this translates to better output.
 Weights that add up to 1 seems to be optimal.