shreyasmeher commited on
Commit
0df242e
·
verified ·
1 Parent(s): 556ea25

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -0
README.md CHANGED
@@ -15,6 +15,30 @@ pipeline_tag: text-classification
15
  [![Model](https://img.shields.io/badge/Base_Model-Qwen2.5--3B--Instruct-purple)](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)
16
  [![License](https://img.shields.io/badge/License-Apache_2.0-red)](https://www.apache.org/licenses/LICENSE-2.0)
17
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  ## Reinforcement Learning Highlights
19
  Unlike traditional supervised fine-tuning (used in ConflLlama), this model uses GRPO to:
20
  1. **Optimize multiple reward signals** simultaneously
 
15
  [![Model](https://img.shields.io/badge/Base_Model-Qwen2.5--3B--Instruct-purple)](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)
16
  [![License](https://img.shields.io/badge/License-Apache_2.0-red)](https://www.apache.org/licenses/LICENSE-2.0)
17
 
18
+
19
+ ## Important Usage Note
20
+
21
+ **Essential:** When using this model, you **must** set the prompt as described below to ensure the model follows the required structured reasoning format. Without explicitly setting the prompt, the model's outputs may not adhere to the expected XML structure and reasoning guidelines.
22
+
23
+ For instance, include the following prompt in your inference code:
24
+
25
+ ```python
26
+ prompt = """
27
+ Respond in the following format:
28
+ <reasoning>
29
+ 1. Triggers detected: [List any event triggers]
30
+ 2. Participants and organizers: [List any actors involved]
31
+ 3. Location details: [Specify the location]
32
+ 4. Violence assessment: [Indicate if violent or non-violent]
33
+ 5. Event category determination: [State and justify the category]
34
+ </reasoning>
35
+ <answer>
36
+ [Final category]
37
+ </answer>
38
+ """
39
+ ```
40
+
41
+
42
  ## Reinforcement Learning Highlights
43
  Unlike traditional supervised fine-tuning (used in ConflLlama), this model uses GRPO to:
44
  1. **Optimize multiple reward signals** simultaneously