Improve model card and add missing information
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
@@ -3,21 +3,33 @@ library_name: transformers
|
|
3 |
tags:
|
4 |
- generated_from_trainer
|
5 |
- open-r1
|
6 |
-
|
|
|
7 |
---
|
8 |
|
9 |
-
# Model Card for
|
10 |
|
11 |
-
This
|
12 |
-
It has been trained using [TRL](https://github.com/huggingface/trl).
|
13 |
|
14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
|
16 |
```python
|
17 |
from transformers import pipeline
|
18 |
|
19 |
question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
|
20 |
-
generator = pipeline("text-generation", model="
|
21 |
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
|
22 |
print(output["generated_text"])
|
23 |
```
|
@@ -26,8 +38,7 @@ print(output["generated_text"])
|
|
26 |
|
27 |
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/sishuzheng/huggingface/runs/339l70ik)
|
28 |
|
29 |
-
|
30 |
-
This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
|
31 |
|
32 |
### Framework versions
|
33 |
|
@@ -39,7 +50,14 @@ This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing
|
|
39 |
|
40 |
## Citations
|
41 |
|
42 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
43 |
|
44 |
```bibtex
|
45 |
@article{zhihong2024deepseekmath,
|
@@ -48,11 +66,8 @@ Cite GRPO as:
|
|
48 |
year = 2024,
|
49 |
eprint = {arXiv:2402.03300},
|
50 |
}
|
51 |
-
|
52 |
```
|
53 |
|
54 |
-
Cite TRL as:
|
55 |
-
|
56 |
```bibtex
|
57 |
@misc{vonwerra2022trl,
|
58 |
title = {{TRL: Transformer Reinforcement Learning}},
|
@@ -62,4 +77,6 @@ Cite TRL as:
|
|
62 |
publisher = {GitHub},
|
63 |
howpublished = {\url{https://github.com/huggingface/trl}}
|
64 |
}
|
65 |
-
```
|
|
|
|
|
|
3 |
tags:
|
4 |
- generated_from_trainer
|
5 |
- open-r1
|
6 |
+
license: cc-by-4.0
|
7 |
+
pipeline_tag: text-generation
|
8 |
---
|
9 |
|
10 |
+
# Model Card for CANOE Models
|
11 |
|
12 |
+
This repository contains several fine-tuned LLMs trained using the CANOE framework, as described in [Teaching Large Language Models to Maintain Contextual Faithfulness via Synthetic Tasks and Reinforcement Learning](https://huggingface.co/papers/2505.16483). CANOE improves the contextual faithfulness of LLMs in both short-form and long-form generation without requiring human annotations. It synthesizes short-form question-answering data and employs a rule-based reinforcement learning method (Dual-GRPO) to optimize response generation.
|
|
|
13 |
|
14 |
+
|
15 |
+
## Available Models
|
16 |
+
|
17 |
+
Here is a list of the available CANOE models:
|
18 |
+
|
19 |
+
| Model | Hugging Face Checkpoint | Base Model | Description |
|
20 |
+
|----------------------|-----------------------------------------|-------------------------|--------------------------------------------------------------------------------- |
|
21 |
+
| **CANOE-LLaMA3-8B** | [🤗 Link](https://huggingface.co/ssz1111/CANOE-LLaMA3-8B) | `meta-llama/Llama-3-8b-instruct` | Chat model, based on LLaMA3-Instruct-8B. |
|
22 |
+
| **CANOE-Qwen2.5-7B** | [🤗 Link](https://huggingface.co/ssz1111/CANOE-Qwen2.5-7B) | `Qwen/Qwen-2.5-Instruct-7B`| Chat model, based on Qwen2.5-Instruct-7B. |
|
23 |
+
| **CANOE-Qwen2.5-14B** | [🤗 Link](https://huggingface.co/ssz1111/CANOE-Qwen2.5-14B)| `Qwen/Qwen-2.5-Instruct-14B`| Chat model, based on Qwen2.5-Instruct-14B. |
|
24 |
+
|
25 |
+
|
26 |
+
## Quick Start
|
27 |
|
28 |
```python
|
29 |
from transformers import pipeline
|
30 |
|
31 |
question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
|
32 |
+
generator = pipeline("text-generation", model="ssz1111/CANOE-LLaMA3-8B", device="cuda") #Use a specific model here
|
33 |
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
|
34 |
print(output["generated_text"])
|
35 |
```
|
|
|
38 |
|
39 |
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/sishuzheng/huggingface/runs/339l70ik)
|
40 |
|
41 |
+
This model was trained using the CANOE framework, which synthesizes short-form question-answering data and uses a rule-based reinforcement learning method (Dual-GRPO). The training data and evaluation datasets are detailed in the Github README.
|
|
|
42 |
|
43 |
### Framework versions
|
44 |
|
|
|
50 |
|
51 |
## Citations
|
52 |
|
53 |
+
```bibtex
|
54 |
+
@article{si2025teaching,
|
55 |
+
title={Teaching Large Language Models to Maintain Contextual Faithfulness via Synthetic Tasks and Reinforcement Learning},
|
56 |
+
author={Si, Shuzheng and Zhao, Haozhe and Gao, Cheng and Bai, Yuzhuo and Wang, Zhitong and Gao, Bofei and Luo, Kangyang and Li, Wenhao and Huang, Yufei and Chen, Gang and others},
|
57 |
+
journal={arXiv preprint arXiv:2505.16483},
|
58 |
+
year={2025}
|
59 |
+
}
|
60 |
+
```
|
61 |
|
62 |
```bibtex
|
63 |
@article{zhihong2024deepseekmath,
|
|
|
66 |
year = 2024,
|
67 |
eprint = {arXiv:2402.03300},
|
68 |
}
|
|
|
69 |
```
|
70 |
|
|
|
|
|
71 |
```bibtex
|
72 |
@misc{vonwerra2022trl,
|
73 |
title = {{TRL: Transformer Reinforcement Learning}},
|
|
|
77 |
publisher = {GitHub},
|
78 |
howpublished = {\url{https://github.com/huggingface/trl}}
|
79 |
}
|
80 |
+
```
|
81 |
+
|
82 |
+
Github repository: [CANOE Github](https://github.com/huggingface/CANOE)
|