ssz1111
/

CANOE-LLaMA3-8B

@@ -3,21 +3,33 @@ library_name: transformers
 tags:
 - generated_from_trainer
 - open-r1
-licence: license
 ---
-# Model Card for None
-This model is a fine-tuned version of [None](https://huggingface.co/None).
-It has been trained using [TRL](https://github.com/huggingface/trl).
-## Quick start
 ```python
 from transformers import pipeline
 question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
-generator = pipeline("text-generation", model="None", device="cuda")
 output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
 print(output["generated_text"])
 ```
@@ -26,8 +38,7 @@ print(output["generated_text"])
 [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/sishuzheng/huggingface/runs/339l70ik)
-This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
 ### Framework versions
@@ -39,7 +50,14 @@ This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing
 ## Citations
-Cite GRPO as:
 ```bibtex
 @article{zhihong2024deepseekmath,
@@ -48,11 +66,8 @@ Cite GRPO as:
     year         = 2024,
     eprint       = {arXiv:2402.03300},
 }
 ```
-Cite TRL as:
 ```bibtex
 @misc{vonwerra2022trl,
 	title        = {{TRL: Transformer Reinforcement Learning}},
@@ -62,4 +77,6 @@ Cite TRL as:
 	publisher    = {GitHub},
 	howpublished = {\url{https://github.com/huggingface/trl}}
 }
-```

 tags:
 - generated_from_trainer
 - open-r1
+license: cc-by-4.0
+pipeline_tag: text-generation
 ---
+# Model Card for CANOE Models
+This repository contains several fine-tuned LLMs trained using the CANOE framework, as described in [Teaching Large Language Models to Maintain Contextual Faithfulness via Synthetic Tasks and Reinforcement Learning](https://huggingface.co/papers/2505.16483).  CANOE improves the contextual faithfulness of LLMs in both short-form and long-form generation without requiring human annotations.  It synthesizes short-form question-answering data and employs a rule-based reinforcement learning method (Dual-GRPO) to optimize response generation.
+## Available Models
+Here is a list of the available CANOE models:
+| Model                 | Hugging Face Checkpoint                 | Base Model             | Description                                                                     |
+|----------------------|-----------------------------------------|-------------------------|--------------------------------------------------------------------------------- |
+| **CANOE-LLaMA3-8B**   | [🤗 Link](https://huggingface.co/ssz1111/CANOE-LLaMA3-8B) | `meta-llama/Llama-3-8b-instruct` | Chat model, based on LLaMA3-Instruct-8B.                                      |
+| **CANOE-Qwen2.5-7B**  | [🤗 Link](https://huggingface.co/ssz1111/CANOE-Qwen2.5-7B) | `Qwen/Qwen-2.5-Instruct-7B`| Chat model, based on Qwen2.5-Instruct-7B.                                     |
+| **CANOE-Qwen2.5-14B** | [🤗 Link](https://huggingface.co/ssz1111/CANOE-Qwen2.5-14B)| `Qwen/Qwen-2.5-Instruct-14B`| Chat model, based on Qwen2.5-Instruct-14B.                                    |
+## Quick Start
 ```python
 from transformers import pipeline
 question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
+generator = pipeline("text-generation", model="ssz1111/CANOE-LLaMA3-8B", device="cuda") #Use a specific model here
 output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
 print(output["generated_text"])
 ```
 [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/sishuzheng/huggingface/runs/339l70ik)
+This model was trained using the CANOE framework, which synthesizes short-form question-answering data and uses a rule-based reinforcement learning method (Dual-GRPO).  The training data and evaluation datasets are detailed in the Github README.
 ### Framework versions
 ## Citations
+```bibtex
+@article{si2025teaching,
+  title={Teaching Large Language Models to Maintain Contextual Faithfulness via Synthetic Tasks and Reinforcement Learning},
+  author={Si, Shuzheng and Zhao, Haozhe and Gao, Cheng and Bai, Yuzhuo and Wang, Zhitong and Gao, Bofei and Luo, Kangyang and Li, Wenhao and Huang, Yufei and Chen, Gang and others},
+  journal={arXiv preprint arXiv:2505.16483},
+  year={2025}
+}
+```
 ```bibtex
 @article{zhihong2024deepseekmath,
     year         = 2024,
     eprint       = {arXiv:2402.03300},
 }
 ```
 ```bibtex
 @misc{vonwerra2022trl,
 	title        = {{TRL: Transformer Reinforcement Learning}},
 	publisher    = {GitHub},
 	howpublished = {\url{https://github.com/huggingface/trl}}
 }
+```
+Github repository: [CANOE Github](https://github.com/huggingface/CANOE)