Lite version for DeepSeek-R1?

#137
by haili-tian - opened

Are you working on train or distill a lite model from DeepSeek-R1?I.,e, model arch is as DeepSeek-R1, but the tensor shape is small. Just like DeepSeek-V2-lite vs DeepSeek-V2.

Sign up or log in to comment