GLM 4 32B too?

#2
by qingy2024 - opened

Hi @alamios !

This model works really well. Are you planning to create an equivalent 0.5B draft model for the new GLM 4-32B-0414 model?

Hey! I'll take a look at it, but don't expect it anytime soon, too many things in my backlog

I will give it a try, but it will be a couple of days:

I'm just creating a 12-headed (~ 0.4B) distilled version of Qwen2.5-0.5B-Instruct that I can use for future draft models (ie: instead of having to trim to 12 heads and then retrain every time for a new model).

After I have this then I should be able to create draft models using way less data (hopefully)... I can generate around 0.5B tokens per day using 7 GPUs, and hope to have at least 2B tokens for creating the distilled version.

If it works, then I'll try GLM 4-32B-0414 first as it seems a good test case with no tiny models available to use as a draft.

@alamios @jukofyork Thanks, looking forward to it!

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment