Faiss k-means layer and centroids
Hi, which layer was the faiss index that is shared here trained on? Was it the last layer?
And a second question, is it possible to extract simple k-means centroid from the index and use vanilla k-means inference eg from sklearn?
Thank you!
Hello,
Thanks for the interest in this model.
The third iteration mHuBERT-147 (the final one) was trained using a faiss index learned using the features extracted from the 9th layer of the second iteration model.
I do not think you could transfer the faiss clustering once learned to sklearn, considering faiss operates with vector compression.
You can check more details on faiss clustering on the paper (https://arxiv.org/pdf/2406.06371), and there are also some training and inference scripts that were made available here (https://github.com/utter-project/mHuBERT-147-scripts/tree/main/03_faiss_indices).
If your goal is to use mHuBERT-147 as a feature extractor, followed by quantization, I would recommend you to train a new clustering model of your preference on a later layer of the model (e.g. 11th). The faiss index we made available is the one used to train this specific model, so it is not suited for this task.
I see, okay thank you very much.