Improving Text Embeddings for Smaller Language Models Using Contrastive Fine-tuning
Abstract
While Large Language Models show remarkable performance in natural language understanding, their resource-intensive nature makes them less accessible. In contrast, smaller language models such as MiniCPM offer more sustainable scalability, but often underperform without specialized optimization. In this paper, we explore the enhancement of smaller language models through the improvement of their text embeddings. We select three language models, MiniCPM, Phi-2, and Gemma, to conduct contrastive fine-tuning on the NLI dataset. Our results demonstrate that this fine-tuning method enhances the quality of text embeddings for all three models across various benchmarks, with MiniCPM showing the most significant improvements of an average 56.33\% performance gain. The contrastive fine-tuning code is publicly available at https://github.com/trapoom555/Language-Model-STS-CFT.
Community
Please cite the source of the prompt. This paper uses the same prompt with PromptEOL[1], but it is not mentioned in the paper.
Prompt in PromptEOL.
[1] Jiang T, Huang S, Luan Z, et al. Scaling sentence embeddings with large language models[J]. arXiv preprint arXiv:2307.16645, 2023.
Thank you for bringing this to our attention! We've added the necessary citation in the latest version of the paper. We appreciate your interest and apologize for the oversight. 🙏🏻 Your feedback helps us improve, and we're grateful for your support. ✨
Hello! This is our open-topic final project for the NLP course, and we're delighted to share the insights we've gained 🤗. Thank you for your comment :D
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Repurposing Language Models into Embedding Models: Finding the Compute-Optimal Recipe (2024)
- ShareLoRA: Parameter Efficient and Robust Large Language Model Fine-tuning via Shared Low-Rank Adaptation (2024)
- D2LLM: Decomposed and Distilled Large Language Models for Semantic Search (2024)
- mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval (2024)
- LoPT: Low-Rank Prompt Tuning for Parameter Efficient Language Models (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 3
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper