The Precision Difference in QKV Projection Weights: FP4 vs. BF16 in DeepSeek R1 FP4 Model

#2
by yoursmin - opened

Hello, I noticed that in your model weights, the QKV projection weights in the attention module are stored in FP4, while the corresponding weights in the DeepSeek R1 FP4 model are stored in BF16. Could you please explain if there is a specific reason or special consideration behind this difference? Thank you for your excellent work, and I look forward to your reply.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment