Same thing as the other one, but I threw an additional ~10m tokens at it for about ~20m total over two trains. 

Same datasets, just different samples from them. 

Feels better now. Next stage: More creative fine-tuning. 

Instruct format is ChatML. Tested with parameters: `temperature = 0.7, top-A = 0.2`.