nielsr HF Staff commited on
Commit
dfa8466
·
verified ·
1 Parent(s): 262c0be

Improve model card and add missing information

Browse files

This PR improves the model card by:

- Correcting the `licence` field to `license`.
- Replacing placeholder model names with the actual model names from the README.
- Adding the correct pipeline tag (`text-generation`).
- Providing a more detailed description of the model based on the paper abstract.
- Adding a link to the Github repository for the project.
- Specifying the license as `cc-by-4.0` based on information found in the Github README.

Files changed (1) hide show
  1. README.md +30 -13
README.md CHANGED
@@ -3,21 +3,33 @@ library_name: transformers
3
  tags:
4
  - generated_from_trainer
5
  - open-r1
6
- licence: license
 
7
  ---
8
 
9
- # Model Card for None
10
 
11
- This model is a fine-tuned version of [None](https://huggingface.co/None).
12
- It has been trained using [TRL](https://github.com/huggingface/trl).
13
 
14
- ## Quick start
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
  ```python
17
  from transformers import pipeline
18
 
19
  question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
20
- generator = pipeline("text-generation", model="None", device="cuda")
21
  output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
22
  print(output["generated_text"])
23
  ```
@@ -26,8 +38,7 @@ print(output["generated_text"])
26
 
27
  [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/sishuzheng/huggingface/runs/339l70ik)
28
 
29
-
30
- This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
31
 
32
  ### Framework versions
33
 
@@ -39,7 +50,14 @@ This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing
39
 
40
  ## Citations
41
 
42
- Cite GRPO as:
 
 
 
 
 
 
 
43
 
44
  ```bibtex
45
  @article{zhihong2024deepseekmath,
@@ -48,11 +66,8 @@ Cite GRPO as:
48
  year = 2024,
49
  eprint = {arXiv:2402.03300},
50
  }
51
-
52
  ```
53
 
54
- Cite TRL as:
55
-
56
  ```bibtex
57
  @misc{vonwerra2022trl,
58
  title = {{TRL: Transformer Reinforcement Learning}},
@@ -62,4 +77,6 @@ Cite TRL as:
62
  publisher = {GitHub},
63
  howpublished = {\url{https://github.com/huggingface/trl}}
64
  }
65
- ```
 
 
 
3
  tags:
4
  - generated_from_trainer
5
  - open-r1
6
+ license: cc-by-4.0
7
+ pipeline_tag: text-generation
8
  ---
9
 
10
+ # Model Card for CANOE Models
11
 
12
+ This repository contains several fine-tuned LLMs trained using the CANOE framework, as described in [Teaching Large Language Models to Maintain Contextual Faithfulness via Synthetic Tasks and Reinforcement Learning](https://huggingface.co/papers/2505.16483). CANOE improves the contextual faithfulness of LLMs in both short-form and long-form generation without requiring human annotations. It synthesizes short-form question-answering data and employs a rule-based reinforcement learning method (Dual-GRPO) to optimize response generation.
 
13
 
14
+
15
+ ## Available Models
16
+
17
+ Here is a list of the available CANOE models:
18
+
19
+ | Model | Hugging Face Checkpoint | Base Model | Description |
20
+ |----------------------|-----------------------------------------|-------------------------|--------------------------------------------------------------------------------- |
21
+ | **CANOE-LLaMA3-8B** | [🤗 Link](https://huggingface.co/ssz1111/CANOE-LLaMA3-8B) | `meta-llama/Llama-3-8b-instruct` | Chat model, based on LLaMA3-Instruct-8B. |
22
+ | **CANOE-Qwen2.5-7B** | [🤗 Link](https://huggingface.co/ssz1111/CANOE-Qwen2.5-7B) | `Qwen/Qwen-2.5-Instruct-7B`| Chat model, based on Qwen2.5-Instruct-7B. |
23
+ | **CANOE-Qwen2.5-14B** | [🤗 Link](https://huggingface.co/ssz1111/CANOE-Qwen2.5-14B)| `Qwen/Qwen-2.5-Instruct-14B`| Chat model, based on Qwen2.5-Instruct-14B. |
24
+
25
+
26
+ ## Quick Start
27
 
28
  ```python
29
  from transformers import pipeline
30
 
31
  question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
32
+ generator = pipeline("text-generation", model="ssz1111/CANOE-LLaMA3-8B", device="cuda") #Use a specific model here
33
  output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
34
  print(output["generated_text"])
35
  ```
 
38
 
39
  [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/sishuzheng/huggingface/runs/339l70ik)
40
 
41
+ This model was trained using the CANOE framework, which synthesizes short-form question-answering data and uses a rule-based reinforcement learning method (Dual-GRPO). The training data and evaluation datasets are detailed in the Github README.
 
42
 
43
  ### Framework versions
44
 
 
50
 
51
  ## Citations
52
 
53
+ ```bibtex
54
+ @article{si2025teaching,
55
+ title={Teaching Large Language Models to Maintain Contextual Faithfulness via Synthetic Tasks and Reinforcement Learning},
56
+ author={Si, Shuzheng and Zhao, Haozhe and Gao, Cheng and Bai, Yuzhuo and Wang, Zhitong and Gao, Bofei and Luo, Kangyang and Li, Wenhao and Huang, Yufei and Chen, Gang and others},
57
+ journal={arXiv preprint arXiv:2505.16483},
58
+ year={2025}
59
+ }
60
+ ```
61
 
62
  ```bibtex
63
  @article{zhihong2024deepseekmath,
 
66
  year = 2024,
67
  eprint = {arXiv:2402.03300},
68
  }
 
69
  ```
70
 
 
 
71
  ```bibtex
72
  @misc{vonwerra2022trl,
73
  title = {{TRL: Transformer Reinforcement Learning}},
 
77
  publisher = {GitHub},
78
  howpublished = {\url{https://github.com/huggingface/trl}}
79
  }
80
+ ```
81
+
82
+ Github repository: [CANOE Github](https://github.com/huggingface/CANOE)