corag
/

CoRAG-Llama3.1-8B-MultihopQA

Model card Files Files and versions Community

intfloat commited on Mar 18

Commit

abd9336

·

verified ·

1 Parent(s): 5f58302

Update README.md

Files changed (1) hide show

README.md +40 -3

README.md CHANGED Viewed

@@ -1,3 +1,40 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+---
+# CoRAG-Llama3.1-8B-MultihopQA
+This is the CoRAG-8B model fine-tuned on [MultihopQA data](https://huggingface.co/datasets/corag/multihopqa) in the paper [Chain-of-Retrieval Augmented Generation](https://arxiv.org/abs/2501.14342).
+## Model Evaluation
+| **Model**                              | **2WikiQA EM** | **2WikiQA F1** | **HotpotQA EM** | **HotpotQA F1** | **Bamboogle EM** | **Bamboogle F1** | **MuSiQue EM** | **MuSiQue F1** |
+|----------------------------------------|----------------|----------------|------------------|------------------|------------------|------------------|----------------|----------------|
+| **3-shot Llama-3.1-8B-Inst.**          | 30.7           | 39.9           | 34.1             | 46.6             | 28.0             | 37.3             | 7.7            | 15.4           |
+| **3-shot GPT-4o**                      | 49.0           | 56.2           | 45.8             | 59.4             | 53.6             | 63.8             | 15.7           | 25.8           |
+| **Fine-tuned Llama-8B w/ E5<sub>large</sub>** | 55.1           | 60.7           | 50.3             | 63.5             | 40.8             | 53.7             | 17.4           | 28.1           |
+| **CoRAG-8B (Ours)**                     |                |                |                  |                  |                  |                  |                |                |
+|   > L=1, greedy              | 56.5           | 62.3           | 50.1             | 63.2             | 37.6             | 51.4             | 18.6           | 29.3           |
+|   > L=6, greedy              | 70.6           | 75.5           | 54.4             | 67.5             | 48.0             | 63.5             | 27.7           | 38.5           |
+|   > L=6, best-of-4           | 71.7           | 76.5           | 55.3             | 68.5             | 51.2             | 63.1             | 28.1           | 39.7           |
+|   > L=6, tree search         | 71.7           | 76.4           | 55.8             | 69.0             | 48.8             | 64.4             | 29.0           | 40.3           |
+|   > L=10, best-of-8          | **72.5**       | **77.3**       | **56.3**         | **69.8**         | **54.4**         | **68.3**         | **30.9**       | **42.4**       |
+Please refer to [TODO](-) for evaluation instructions.
+Model predictions are available as the `predictions` field at https://huggingface.co/datasets/corag/multihopqa
+## Disclaimer
+This model has been specifically trained for the task of MultihopQA. It may not perform well on other tasks.
+## References
+```
+@article{wang2025chain,
+  title={Chain-of-Retrieval Augmented Generation},
+  author={Wang, Liang and Chen, Haonan and Yang, Nan and Huang, Xiaolong and Dou, Zhicheng and Wei, Furu},
+  journal={arXiv preprint arXiv:2501.14342},
+  year={2025}
+}
+```