Unsupervised fine-tuning

#24
by Dee5969 - opened

Is there a good way to further fine-tune these embeddings on data with no labels?

Dee5969 changed discussion status to closed
Dee5969 changed discussion status to open
Nomic AI org

You can use the contrastors library directly or use Sentence Transformers 3

zpn changed discussion status to closed

Would the best way be to take nomic-bert-2048 and perform MLM with my domain data (unstructured text), and then replicate the contrastive training with Nomic's provided datasets? Or can nomic-embed-text-v1 be fine-tuned with the unstructured text? In my data, I do not have pairs for contrastive training.

Nomic AI org

Unfortunately, the best would be to find pairs for contrastrive training. However, you probably only need ~10k pairs and can finetune nomic-embed-text-v1.5 or v1

Sign up or log in to comment