Thank you so much for this great article!
For jotting down all the concepts in a very simple manner!
But I have one question or rather analogy to be asked -
When we talk about KLD, we certainly check the distribution as a whole, not some distance between the vectors or likewise,
When we say about forward KLD, the mode/peek covering behaviour stands out. So while quantization, when you say we preserve the quantized model like the original one, are going with that mode covering behaviour?
