Abstract
Despite their remarkable capabilities, LLMs learn word representations that exhibit the undesirable yet poorly understood feature of anisotropy. In this paper, we argue that the second moment in Adam is a cause of anisotropic embeddings, and suggest a modified optimizer called Coupled Adam to mitigate the problem. Our experiments demonstrate that Coupled Adam significantly improves the quality of embeddings, while also leading to better upstream and downstream performance on large enough datasets.
Community
TLDR: We introduce Coupled Adam to mitigate the problem of anisotropic embeddings in LLMs
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Enhancing Lexicon-Based Text Embeddings with Large Language Models (2025)
- ReNeg: Learning Negative Embedding with Reward Guidance (2024)
- Solvable Dynamics of Self-Supervised Word Embeddings and the Emergence of Analogical Reasoning (2025)
- Momentum Contrastive Learning with Enhanced Negative Sampling and Hard Negative Filtering (2025)
- Shrink the longest: improving latent space isotropy with symplicial geometry (2025)
- Astromer 2 (2025)
- Statistical Coherence Alignment for Large Language Model Representation Learning Through Tensor Field Convergence (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
@flxst I tried implementing the paper here: https://github.com/llmsresearch/coupledadam request to update it if you think it's not correctly implementing the research paper.
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper