allura-org
/

GLM4-9B-Neon-v2

Text Generation

Model card Files Files and versions

AuriAetherwiing commited on Apr 26

Commit

3251a81

·

verified ·

1 Parent(s): 2c65aea

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -23,7 +23,7 @@ Model was trained by Auri.
 **Training notes**
-Model was trained on a dataset consisting of 77M tokens of synthetic RP and short story gen data for one epoch. Training took around 11 hours on 2xRTX 3090 workstation, generously provided by [OwenArli](https://huggingface.co/OwenArli). Went with some sane defaults for training config, QLoRA plus CCE and sequence parallelism for nice chunk of memory usage optimization, 16k fit on 48GB nicely with some room to spare. I seem to have a problem with Eval/Loss being broken, not sure why, otherwise it trained smoothly.
 Huge thanks to [ArliAI](https://www.arliai.com/) for providing compute and collaborating on this run!

 **Training notes**
+Model was trained on a dataset consisting of 77M tokens of synthetic RP and short story gen data for one epoch. Training took around 11 hours on 2xRTX 3090 workstation, generously provided by [OwenArli](https://huggingface.co/OwenArli). Went with some sane defaults for training config, QLoRA plus CCE for a nice chunk of memory usage optimization, 16k fit on 48GB nicely with some room to spare. I seem to have a problem with Eval/Loss being broken, not sure why, otherwise it trained smoothly.
 Huge thanks to [ArliAI](https://www.arliai.com/) for providing compute and collaborating on this run!