Same thing as the other one, but I threw an additional ~10m tokens at it for about ~20m total over two trains. Same datasets, just different samples from them. Feels better now. Next stage: More creative fine-tuning. Instruct format is ChatML. Tested with parameters: `temperature = 0.7, top-A = 0.2`.