FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents
Abstract
We introduce FreshStack, a reusable framework for automatically building information retrieval (IR) evaluation benchmarks from community-asked questions and answers. FreshStack conducts the following steps: (1) automatic corpus collection from code and technical documentation, (2) nugget generation from community-asked questions and answers, and (3) nugget-level support, retrieving documents using a fusion of retrieval techniques and hybrid architectures. We use FreshStack to build five datasets on fast-growing, recent, and niche topics to ensure the tasks are sufficiently challenging. On FreshStack, existing retrieval models, when applied out-of-the-box, significantly underperform oracle approaches on all five topics, denoting plenty of headroom to improve IR quality. In addition, we identify cases where rerankers do not clearly improve first-stage retrieval accuracy (two out of five topics). We hope that FreshStack will facilitate future work toward constructing realistic, scalable, and uncontaminated IR and RAG evaluation benchmarks. FreshStack datasets are available at: https://fresh-stack.github.io.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- From Retrieval to Generation: Comparing Different Approaches (2025)
- Optimizing open-domain question answering with graph-based retrieval augmented generation (2025)
- Leveraging LLMs for Utility-Focused Annotation: Reducing Manual Effort for Retrieval and RAG (2025)
- Judging the Judges: A Collection of LLM-Generated Relevance Judgements (2025)
- SUNAR: Semantic Uncertainty based Neighborhood Aware Retrieval for Complex QA (2025)
- VDocRAG: Retrieval-Augmented Generation over Visually-Rich Documents (2025)
- Conversational Gold: Evaluating Personalized Conversational Search System using Gold Nuggets (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
really nice to see your continued work and open source contributions , especially now that this topic is really gaining traction . Looking forward to adding this to my toolbox and looking forward to the code release , hopefully in an easy to use command line tool :-)
Models citing this paper 0
No model linking this paper
Datasets citing this paper 2
Spaces citing this paper 0
No Space linking this paper