Model Request: Qwen 3 Series

#4
by DongmingShenDS - opened

Can we llamafy Qwen3 series?

Any in particular you are interested in?

I will give it a shot, but I'm afraid Qwen3 may diverge from llama enough to make it nontrivial.

Thanks for the feedback.

Actually upon further inspection, indeed, Qwen3 seems to have different attention that differs quite a lot from Llama3. Might not worth the effort to support a llamafy yet...

Yeah, perhaps.

Is there anything in particular that doesn't support Qwen3 and would need llamafication to work? It seems really popular to me, and the reason for doing this is basically for frameworks that don't support a particular architecture.

llamafy org

Any in particular you are interested in?

I will give it a shot, but I'm afraid Qwen3 may diverge from llama enough to make it nontrivial.

I think theoretically it is possible but would require significant retraining/posttraining to heal the model and make it coherent, I guess it is a cost/benefit tradeoff.

Sign up or log in to comment