RWKV-7 trained on the Pile w/ "20b tokenizer" (332115325534 tokens)

0.1B = L12-D768, lr 8e-4 to 3e-5 cosine decay, wd 0.1, bsz 8x30x4096

0.4B = L24-D1024, lr 6e-4 to 2e-5 cosine decay, wd 0.1, bsz 8x30x4096

1.5B = L24-D2048, lr 5e-4 to 1.5e-5 cosine decay, wd 0.1, bsz 8x45x4096

Check https://github.com/BlinkDL/RWKV-LM for details.

How to run it:

https://pypi.org/project/rwkv/

or

https://github.com/BlinkDL/RWKV-LM/tree/main/RWKV-v7

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .

Dataset used to train BlinkDL/rwkv-7-pile