NeoChen1024 commited on
Commit
2f7fc75
·
verified ·
1 Parent(s): db54f24

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -0
README.md ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - cognitivecomputations/dolphin-2.9.4-llama3.1-8b
4
+ ---
5
+
6
+ # GGUF quants of cognitivecomputations/dolphin-2.9.4-llama3.1-8b, here I have:
7
+ ```
8
+ IQ4_XS (4.2G, 8.7992 +/- 0.11237, fits into 8GiB VRAM + 4096 context with F16 KV cache)
9
+ Q4_K_M (4.6G, 8.7948 +/- 0.11223, fits into 8GiB VRAM + 4096 context with F16 KV cache, also good for CPU inference on E5-26xx v3/v4)
10
+ Q8_0 (8.0G, 8.5970 +/- 0.10933, imatrix derived from it)
11
+ F16 ( 15G, 8.6617 +/- 0.11043, for 24GiB VRAM, imatrix derived from it)
12
+ ```
13
+ Perplexity measured with `-fa -c 2048 -ub 2048` on UTF-8 text version of ["Wired Love" from Project Gutenberg](http://www.gutenberg.org/ebooks/24353).