Base version?

#13

by ToastyPigeon - opened May 9

May 9

Hi, I like what you've done here with this prune - I did one train on it (well, one normal train that worked as expected) and it seems like it took to it pretty well without seeming to lose a ton relative to the 30b. But like the 30b it looooves to repeat itself, constantly.

Any chance you'd be willing to replicate the process on Qwen3-30B-A3B-Base? after seeing Gryphe's success training on base, I think that might be a good thing to experiment with.

AuriAetherwiing

May 9

+1
I'm also interested in a base version, since Qwen instruct is deepfried, with instructs having serious repetition problems after 4096 tokens of context.
Gryphe's recent 30B-A3B finetune on the base version and it's actually functional context proves, I think, that Qwen's instruct tuning is a problem

marcuscedricridia

May 12

+1
Also interested, would make a restore model out of it, then retrain and experiment with use cases.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment