Any plans for 32B/70B distilled models?

#83
by NanaBanana22 - opened

Hey. Any plans to distill qwen3 32b / llama 70b?

we want this too!

Please no more distills. They just lack so far behind because they use entirely different architectures (in this case, Qwen3)

I'd rather have a DeepSeek R1 Lite. The same model, with the same training data, just scaled down so it can run on consumer hardware.

Sign up or log in to comment