The Precision Difference in QKV Projection Weights: FP4 vs. BF16 in DeepSeek R1 FP4 Model

by yoursmin - opened Mar 7

Mar 7

Hello, I noticed that in your model weights, the QKV projection weights in the attention module are stored in FP4, while the corresponding weights in the DeepSeek R1 FP4 model are stored in BF16. Could you please explain if there is a specific reason or special consideration behind this difference? Thank you for your excellent work, and I look forward to your reply.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment