---
license: apache-2.0
---

# CoRAG-Llama3.1-8B-MultihopQA

This is the CoRAG-8B model fine-tuned on [MultihopQA data](https://huggingface.co/datasets/corag/multihopqa) in the paper [Chain-of-Retrieval Augmented Generation](https://arxiv.org/abs/2501.14342).

## Model Evaluation
   
| **Model**                              | **2WikiQA EM** | **2WikiQA F1** | **HotpotQA EM** | **HotpotQA F1** | **Bamboogle EM** | **Bamboogle F1** | **MuSiQue EM** | **MuSiQue F1** |  
|----------------------------------------|----------------|----------------|------------------|------------------|------------------|------------------|----------------|----------------|  
| **3-shot Llama-3.1-8B-Inst.**          | 30.7           | 39.9           | 34.1             | 46.6             | 28.0             | 37.3             | 7.7            | 15.4           |  
| **3-shot GPT-4o**                      | 49.0           | 56.2           | 45.8             | 59.4             | 53.6             | 63.8             | 15.7           | 25.8           |  
| **Fine-tuned Llama-8B w/ E5<sub>large</sub>** | 55.1           | 60.7           | 50.3             | 63.5             | 40.8             | 53.7             | 17.4           | 28.1           |  
| **CoRAG-8B (Ours)**                     |                |                |                  |                  |                  |                  |                |                |  
|   > L=1, greedy              | 56.5           | 62.3           | 50.1             | 63.2             | 37.6             | 51.4             | 18.6           | 29.3           |  
|   > L=6, greedy              | 70.6           | 75.5           | 54.4             | 67.5             | 48.0             | 63.5             | 27.7           | 38.5           |  
|   > L=6, best-of-4           | 71.7           | 76.5           | 55.3             | 68.5             | 51.2             | 63.1             | 28.1           | 39.7           |  
|   > L=6, tree search         | 71.7           | 76.4           | 55.8             | 69.0             | 48.8             | 64.4             | 29.0           | 40.3           |  
|   > L=10, best-of-8          | **72.5**       | **77.3**       | **56.3**         | **69.8**         | **54.4**         | **68.3**         | **30.9**       | **42.4**       |  

Please refer to [https://github.com/microsoft/LMOps/tree/main/corag](https://github.com/microsoft/LMOps/tree/main/corag) for evaluation instructions.

Model predictions are available as the `predictions` field at https://huggingface.co/datasets/corag/multihopqa

## Disclaimer

This model has been specifically trained for the task of MultihopQA. It may not perform well on other tasks.

## References

```
@article{wang2025chain,
  title={Chain-of-Retrieval Augmented Generation},
  author={Wang, Liang and Chen, Haonan and Yang, Nan and Huang, Xiaolong and Dou, Zhicheng and Wei, Furu},
  journal={arXiv preprint arXiv:2501.14342},
  year={2025}
}
```