Papers
arxiv:2506.18297

Comparative Analysis of Lion and AdamW Optimizers for Cross-Encoder Reranking with MiniLM, GTE, and ModernBERT

Published on Jun 23
Authors:
,
,

Abstract

The Lion optimizer improves the performance and GPU efficiency of cross-encoder rerankers fine-tuned on transformer models for information retrieval tasks.

AI-generated summary

Modern information retrieval systems often employ a two-stage pipeline: an efficient initial retrieval stage followed by a computationally intensive reranking stage. Cross-encoders have shown strong effectiveness for reranking due to their deep analysis of query-document pairs. This paper studies the impact of the Lion optimizer, a recent alternative to AdamW, during fine-tuning of cross-encoder rerankers. We fine-tune three transformer models-MiniLM, GTE, and ModernBERT-on the MS MARCO passage ranking dataset using both optimizers. GTE and ModernBERT support extended context lengths (up to 8192 tokens). We evaluate effectiveness using TREC 2019 Deep Learning Track and MS MARCO dev set (MRR@10). Experiments, run on the Modal cloud platform, reveal that ModernBERT with Lion achieves the best NDCG@10 (0.7225) and MAP (0.5121) on TREC DL 2019, while MiniLM with Lion ties ModernBERT for MRR@10 (0.5988) on MS MARCO dev. Lion also provides superior GPU efficiency, improving utilization by 2.67% to 10.33% across models. We analyze performance trends using standard IR metrics and discuss the optimizer's impact on training dynamics across architectures.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2506.18297 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2506.18297 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2506.18297 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.