More training details?

#15
by elepedus - opened

Hi,

Thanks so much for the model and very well-written & approachable technical report – It's great to see continuing work on BitNet!

I wonder if you could share any more details about the training, especially regarding cost / resource utilisation and how it compares to un-quantised training runs? Naively, I would expect meaningful efficiency gains at training time as well, but it would be great to get some concrete numbers.

Thanks in advance,
Ed

I don't think there is any benefit in training efficiency since it still use full precision in training stage.

I don't think there is any benefit in training efficiency since it still use full precision in training stage.

Really??

I thought the whole point of BitNet was that the ternary “quantisation” is applied during training, rather than afterwards. Otherwise why even bother training a new model in the first place?

The authors could have just applied post-training ternary quantisation to an existing model like Phi-4. It probably would have been quicker, cheaper and served to more clearly indicate that this is “just” post-training quantisation, providing a clearer, more direct baseline to compare against.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment