Pythia 160M, Mamba 130M, and RWKV 169M models trained on OpenWebText for 4000 steps (context window: 1024; effective batch size: 512). 6 seeds each.
James Michaelov
jmichaelov
AI & ML interests
None yet
Recent Activity
updated
a model
3 days ago
jmichaelov/pythia-160m-512k-10mtok-mr20-decay80
updated
a model
3 days ago
jmichaelov/pythia-160m-512k-10mtok-mr90-decay90
updated
a model
3 days ago
jmichaelov/pythia-160m-512k-10mtok-mr90-decay90