Unsupervised fine-tuning
#24
by
Dee5969
- opened
Is there a good way to further fine-tune these embeddings on data with no labels?
Dee5969
changed discussion status to
closed
Dee5969
changed discussion status to
open
You can use the contrastors library directly or use Sentence Transformers 3
zpn
changed discussion status to
closed
Would the best way be to take nomic-bert-2048 and perform MLM with my domain data (unstructured text), and then replicate the contrastive training with Nomic's provided datasets? Or can nomic-embed-text-v1 be fine-tuned with the unstructured text? In my data, I do not have pairs for contrastive training.
Unfortunately, the best would be to find pairs for contrastrive training. However, you probably only need ~10k pairs and can finetune nomic-embed-text-v1.5 or v1