Lite version for DeepSeek-R1?
#137
by
haili-tian
- opened
Are you working on train or distill a lite model from DeepSeek-R1?I.,e, model arch is as DeepSeek-R1, but the tensor shape is small. Just like DeepSeek-V2-lite vs DeepSeek-V2.