DDRO-Generative-Document-Retrieval

This collection contains four generative retrieval models trained using Direct Document Relevance Optimization (DDRO), a lightweight alternative to reinforcement learning for aligning docid generation with document-level relevance through pairwise ranking.

The models are trained on two benchmark datasets (MS MARCO (MS300K) and Natural Questions (NQ320K)) with two types of document identifiers:

  • PQ (Product Quantization): captures deep semantic features for complex queries.
  • TU (Title + URL): leverages surface-level lexical signals for keyword-driven retrieval.

πŸ“Œ Models

Dataset Docid Type Model Name MRR@10 R@10
MS MARCO (MS300K) PQ ddro-msmarco-pq 45.76 73.02
MS MARCO (MS300K) TU ddro-msmarco-tu 50.07 74.01
πŸ“Natural Questions (NQ320K) PQ ddro-nq-pq 55.51 67.31
Natural Questions (NQ320K) TU ddro-nq-tu 45.99 55.98

πŸš€ Intended Uses

  • Generative document retrieval and ranking
  • Open-domain question answering
  • Semantic and keyword-oriented search
  • Research and benchmarking in Information Retrieval (IR)

πŸ—οΈ Model Architecture

  • Base: T5-base
  • Training: Supervised Fine-tuning (SFT) + Pairwise Ranking (Direct L2R)

πŸ“– Citation

If you use these models, please cite:

@inproceedings{anonymous2025ddro,
  title={Lightweight and Direct Document Relevance Optimization for Generative Information Retrieval},
  author={Anonymous},
  booktitle={Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’25)},
  year={2025},
}

🌟 Highlights

  • No reinforcement learning or reward modeling
  • Lightweight and efficient optimization
  • Public checkpoints for reproducibility

Downloads last month
2
Safetensors
Model size
257M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for kiyam/ddro-nq-pq

Finetuned
(793)
this model

Collection including kiyam/ddro-nq-pq