Unsupervised fine-tuning

#24

by Dee5969 - opened Jun 10

Discussion

Dee5969

Jun 10

Is there a good way to further fine-tune these embeddings on data with no labels?

Dee5969 changed discussion status to closed Jun 10

Dee5969 changed discussion status to open Jun 10

zpn

Nomic AI org Jun 10

You can use the contrastors library directly or use Sentence Transformers 3

zpn changed discussion status to closed Jun 10

Dee5969

Jun 10

Would the best way be to take nomic-bert-2048 and perform MLM with my domain data (unstructured text), and then replicate the contrastive training with Nomic's provided datasets? Or can nomic-embed-text-v1 be fine-tuned with the unstructured text? In my data, I do not have pairs for contrastive training.

zpn

Nomic AI org Jun 10

Unfortunately, the best would be to find pairs for contrastrive training. However, you probably only need ~10k pairs and can finetune nomic-embed-text-v1.5 or v1

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment