m-a-p/YuE-s1-7B-anneal-en-cot · How would we go about increasing the fidelity

As it stands, this model is great for getting ideas that would still need to be developed, because of the output lower fidelity.
I assume the problem here is that the token Encoder that is used during training was decidely low-fi, and probably a desire to keep the context window smaller, and toward training and inference without a massive farm.

Am I correct? Do we need to completely re-train with better quality samples, or is the problem something else?
Is there a happy medium we could reach where fidelity is much higher without blowing out the resource load?