Rank-R1-v0.2
Collection
1 item
•
Updated
The following changes are made in v0.2:
prompt_system = "You are RankLLM, an intelligent assistant capable of evaluating the relevancy of passages to a given query."
prompt_user = '''You will be presented with a query, and a set of documents.
Your task consists of the following step:
1. Analyze the query: Carefully read the query and identify the core problem or question being asked.
2. Analyze the documents: Thoroughly examine each document and briefly explain how each document is relevant or not relevant to the query.
3. Find the most relevant document: Based on your analysis, select the most relevant document to the query from the set and briefly explain why.
Important: Provide your analysis within the <think> </think> tags and answer only the label of the most relevant document, enclosed in square brackets, within the <answer> </answer> tags. For example, if the third document is the most relevant, your response should be:
<think> Your analysis here </think>
<answer>[3]</answer>
Here is the query: {query}
Here are the documents:
{docs}'''
Method | Bio. | Earth. | Econ. | Psy. | Rob. | Stack. | Sus. | Leet. | Pony | AoPS | TheoQ. | TheoT | Average |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
a. BM25+GPT4 CoT | 53.6 | 54.1 | 24.3 | 38.7 | 18.9 | 27.7 | 26.3 | 19.3 | 17.6 | 3.9 | 19.2 | 20.8 | 27.0 |
b. ReasonIR+GPT4 CoT | 43.6 | 42.9 | 32.7 | 38.8 | 20.9 | 25.8 | 27.5 | 31.5 | 19.6 | 7.4 | 33.1 | 35.7 | 29.9 |
c. a => Rank1-32B | 49.7 | 35.8 | 22.0 | 37.5 | 22.5 | 21.7 | 35.0 | 18.8 | 32.5 | 10.8 | 22.9 | 43.7 | 29.4 |
d. a => Rank-K-32B | 50.4 | 46.2 | 30.6 | 46.7 | 32.4 | 33.0 | 41.2 | 24.0 | 32.2 | 7.6 | 28.3 | 26.6 | 33.3 |
e. b => QwenRerank | 58.2 | 53.2 | 32.0 | 43.6 | 28.8 | 37.6 | 36.0 | 33.2 | 34.8 | 7.9 | 32.6 | 45.0 | 36.9 |
f. a => Rank-R1-v0.2-32B (ours) | 62.3 | 59.3 | 34.1 | 50.7 | 32.4 | 38.9 | 46.3 | 26.6 | 18.1 | 10.6 | 31.1 | 41.2 | 37.6 |
g. b => Rank-R1-v0.2-32B (ours) | 60.1 | 56.3 | 36.6 | 52.1 | 30.2 | 37.6 | 45.9 | 25.5 | 14.6 | 10.1 | 38.6 | 44.3 | 37.7 |
h. g + b (Hybrid)^ (ours) | 59.5 | 55.1 | 37.9 | 52.7 | 30.0 | 39.3 | 45.1 | 32.1 | 17.1 | 10.7 | 40.4 | 45.6 | 38.8 |
All the rerankers rerank using the original query without GPT4 CoT.
^ reranked results hybrid with the first-stage results, with score min-max norm and 0.1 weight on the first-stage document scores, no extra ranker and retrieval is introduced.