Text Ranking
Transformers
Safetensors

Rank-R1-v0.2

code

The following changes are made in v0.2:

  1. The base models were switched from Qwen2.5 to the Qwen3 series.
  2. The training data was changed from Tevatron/msmarco-passage to Tevatron/reasonir-data-hn, which uses ReasonIR provided synthetic queries, positive documents, synthetic hard negative documents, and we further mined other hard negatives using BM25.
  3. The number of documents in the prompt are varing from 2 to 10 during RL training. During inference, we use 10 documents in the prompt.
  4. Some GRPO training hyperparameters changes.
  5. The prompt was changed as shown below:
prompt_system = "You are RankLLM, an intelligent assistant capable of evaluating the relevancy of passages to a given query."

prompt_user = '''You will be presented with a query, and a set of documents.

Your task consists of the following step:

1. Analyze the query: Carefully read the query and identify the core problem or question being asked.

2. Analyze the documents: Thoroughly examine each document and briefly explain how each document is relevant or not relevant to the query.

3. Find the most relevant document: Based on your analysis, select the most relevant document to the query from the set and briefly explain why.

Important: Provide your analysis within the <think> </think> tags and answer only the label of the most relevant document, enclosed in square brackets, within the <answer> </answer> tags. For example, if the third document is the most relevant, your response should be:
<think> Your analysis here </think>
<answer>[3]</answer>

Here is the query: {query}

Here are the documents:
{docs}'''

BRIGHT results

Method Bio. Earth. Econ. Psy. Rob. Stack. Sus. Leet. Pony AoPS TheoQ. TheoT Average
a. BM25+GPT4 CoT 53.6 54.1 24.3 38.7 18.9 27.7 26.3 19.3 17.6 3.9 19.2 20.8 27.0
b. ReasonIR+GPT4 CoT 43.6 42.9 32.7 38.8 20.9 25.8 27.5 31.5 19.6 7.4 33.1 35.7 29.9
c. a => Rank1-32B 49.7 35.8 22.0 37.5 22.5 21.7 35.0 18.8 32.5 10.8 22.9 43.7 29.4
d. a => Rank-K-32B 50.4 46.2 30.6 46.7 32.4 33.0 41.2 24.0 32.2 7.6 28.3 26.6 33.3
e. b => QwenRerank 58.2 53.2 32.0 43.6 28.8 37.6 36.0 33.2 34.8 7.9 32.6 45.0 36.9
f. a => Rank-R1-v0.2-32B (ours) 62.3 59.3 34.1 50.7 32.4 38.9 46.3 26.6 18.1 10.6 31.1 41.2 37.6
g. b => Rank-R1-v0.2-32B (ours) 60.1 56.3 36.6 52.1 30.2 37.6 45.9 25.5 14.6 10.1 38.6 44.3 37.7
h. g + b (Hybrid)^ (ours) 59.5 55.1 37.9 52.7 30.0 39.3 45.1 32.1 17.1 10.7 40.4 45.6 38.8
  • All the rerankers rerank using the original query without GPT4 CoT.

  • ^ reranked results hybrid with the first-stage results, with score min-max norm and 0.1 weight on the first-stage document scores, no extra ranker and retrieval is introduced.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train ielabgroup/Rank-R1-32B-v0.2

Collection including ielabgroup/Rank-R1-32B-v0.2