Awan LLM
commited on
Commit
•
88003dd
1
Parent(s):
78847a3
Update README.md
Browse files
README.md
CHANGED
@@ -15,6 +15,11 @@ In terms of reasoning and intelligence, this model is probably worse than the OG
|
|
15 |
Will soon have quants uploaded here on HF and have it up on our site https://awanllm.com for anyone to try.
|
16 |
|
17 |
|
|
|
|
|
|
|
|
|
|
|
18 |
Training:
|
19 |
- 4096 sequence length, while the base model is 8192 sequence length. From testing it still performs the same 8192 context just fine.
|
20 |
- Training duration is around 3 days on an RTX 4090, using 4-bit loading and Qlora 64-rank 128-alpha resulting in ~2% trainable weights.
|
|
|
15 |
Will soon have quants uploaded here on HF and have it up on our site https://awanllm.com for anyone to try.
|
16 |
|
17 |
|
18 |
+
OpenLLM Benchmark:
|
19 |
+
|
20 |
+
![OpenLLM Leaderboard](https://huggingface.co/AwanLLM/Awanllm-Llama-3-8B-Cumulus-v0.2/blob/main/Screenshot%202024-05-02%20201231.png "OpenLLM Leaderboard")
|
21 |
+
|
22 |
+
|
23 |
Training:
|
24 |
- 4096 sequence length, while the base model is 8192 sequence length. From testing it still performs the same 8192 context just fine.
|
25 |
- Training duration is around 3 days on an RTX 4090, using 4-bit loading and Qlora 64-rank 128-alpha resulting in ~2% trainable weights.
|