Base version?
Hi, I like what you've done here with this prune - I did one train on it (well, one normal train that worked as expected) and it seems like it took to it pretty well without seeming to lose a ton relative to the 30b. But like the 30b it looooves to repeat itself, constantly.
Any chance you'd be willing to replicate the process on Qwen3-30B-A3B-Base? after seeing Gryphe's success training on base, I think that might be a good thing to experiment with.
+1
I'm also interested in a base version, since Qwen instruct is deepfried, with instructs having serious repetition problems after 4096 tokens of context.
Gryphe's recent 30B-A3B finetune on the base version and it's actually functional context proves, I think, that Qwen's instruct tuning is a problem
+1
Also interested, would make a restore model out of it, then retrain and experiment with use cases.