lbourdois commited on
Commit
52b17b0
·
verified ·
1 Parent(s): e220baa

Improve language tag

Browse files

Hi! As the model is multilingual, this is a PR to add other languages than English to the language tag to improve the referencing. Note that 29 languages are announced in the README, but only 13 are explicitly listed. I was therefore only able to add these 13 languages.

Files changed (1) hide show
  1. README.md +102 -88
README.md CHANGED
@@ -1,89 +1,103 @@
1
- ---
2
- base_model:
3
- - Qwen/Qwen2.5-3B-Instruct
4
- pipeline_tag: text-generation
5
- library_name: transformers
6
- ---
7
- # AbleCredit Reasoner R0 Qwen 2.5 3B Instruct
8
-
9
- ## Introduction
10
-
11
- This model is trained by Deepseek R1 style (GRPO) reinforcement learning on Qwen 2.5 3B Instruct as a base model.
12
- Primarily intended for research in application of small LLMs trained using GRPO/RL in the domain of finance, credit underwriting etc.
13
-
14
- ### Model Description
15
-
16
- - **Fine Tuned by:** AbleCredit (LightBees Technologies Private Limited, Bengaluru, India)
17
- - **License:** We've retained the original Qwen research license. Note that license does not allow commercial use.
18
- - **Finetuned from model:** Qwen/Qwen2.5-3B-Instruct
19
-
20
- ## How to Get Started with the Model
21
-
22
- Use with standard Huggingface based setup
23
-
24
- ```python
25
- model_name = "AbleCredit/AbleCredit-R0-Qwen-2.5-3B-Instruct" # or local path to model
26
- system_prompt = {
27
- "role": "system",
28
- "content": (
29
- "You are a helpful assistant. User asks a question the assistant answers it.\n"
30
- "The assistant first thinks about reasoning process in mind and then provides the user with the answer."
31
- ),
32
- }
33
-
34
- suffix_prompt = {
35
- "role": "assistant",
36
- "content": "Let me solve this step by step.\n<think>",
37
- }
38
-
39
- prompt_msgs = [
40
- system_prompt,
41
- {"role": "user", "content": "What is 15 times 3 ?"},
42
- suffix_prompt,
43
- ]
44
-
45
- base_model = AutoModelForCausalLM.from_pretrained(
46
- model_name,
47
- device_map="auto",
48
- torch_dtype=torch.bfloat16,
49
- )
50
- tokenizer = AutoTokenizer.from_pretrained(model_name)
51
-
52
- prompt = tokenizer.apply_chat_template(
53
- prompt_msgs,
54
- tokenize=False,
55
- continue_final_message=True,
56
- add_generation_prompt=False,
57
- )
58
-
59
- # Tokenize the prompt and move it to the appropriate device.
60
- inputs = tokenizer(prompt, return_tensors="pt").to("cuda:0")
61
-
62
- print("\nGenerating response...\n")
63
- outputs = model.generate(
64
- **inputs,
65
- max_new_tokens=1024,
66
- temperature=0.5,
67
- min_p=0.01,
68
- )
69
- response = tokenizer.decode(outputs[0], skip_special_tokens=True)
70
- print("\nResponse:\n", response)
71
- ```
72
-
73
- ## Training Details
74
-
75
- ### Training Data
76
-
77
- Trained using open source logical reasoning datasets and a proprietary finance dataset created by AbleCredit.com.
78
-
79
- ### Training Procedure
80
-
81
- Trained using deepseek style reinforcement learning using GRPO with rule based rewards.
82
-
83
- ## Evaluation
84
-
85
- - Model achieves ~67% score on GSM8K benchmark in a **zero shot** setting (check benchmarking script for more details).
86
-
87
- ## Model Card Contact
88
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
89
  [contact Harshad Saykhedkar via LinkedIn](https://www.linkedin.com/in/harshadss/)
 
1
+ ---
2
+ base_model:
3
+ - Qwen/Qwen2.5-3B-Instruct
4
+ pipeline_tag: text-generation
5
+ library_name: transformers
6
+ language:
7
+ - zho
8
+ - eng
9
+ - fra
10
+ - spa
11
+ - por
12
+ - deu
13
+ - ita
14
+ - rus
15
+ - jpn
16
+ - kor
17
+ - vie
18
+ - tha
19
+ - ara
20
+ ---
21
+ # AbleCredit Reasoner R0 Qwen 2.5 3B Instruct
22
+
23
+ ## Introduction
24
+
25
+ This model is trained by Deepseek R1 style (GRPO) reinforcement learning on Qwen 2.5 3B Instruct as a base model.
26
+ Primarily intended for research in application of small LLMs trained using GRPO/RL in the domain of finance, credit underwriting etc.
27
+
28
+ ### Model Description
29
+
30
+ - **Fine Tuned by:** AbleCredit (LightBees Technologies Private Limited, Bengaluru, India)
31
+ - **License:** We've retained the original Qwen research license. Note that license does not allow commercial use.
32
+ - **Finetuned from model:** Qwen/Qwen2.5-3B-Instruct
33
+
34
+ ## How to Get Started with the Model
35
+
36
+ Use with standard Huggingface based setup
37
+
38
+ ```python
39
+ model_name = "AbleCredit/AbleCredit-R0-Qwen-2.5-3B-Instruct" # or local path to model
40
+ system_prompt = {
41
+ "role": "system",
42
+ "content": (
43
+ "You are a helpful assistant. User asks a question the assistant answers it.\n"
44
+ "The assistant first thinks about reasoning process in mind and then provides the user with the answer."
45
+ ),
46
+ }
47
+
48
+ suffix_prompt = {
49
+ "role": "assistant",
50
+ "content": "Let me solve this step by step.\n<think>",
51
+ }
52
+
53
+ prompt_msgs = [
54
+ system_prompt,
55
+ {"role": "user", "content": "What is 15 times 3 ?"},
56
+ suffix_prompt,
57
+ ]
58
+
59
+ base_model = AutoModelForCausalLM.from_pretrained(
60
+ model_name,
61
+ device_map="auto",
62
+ torch_dtype=torch.bfloat16,
63
+ )
64
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
65
+
66
+ prompt = tokenizer.apply_chat_template(
67
+ prompt_msgs,
68
+ tokenize=False,
69
+ continue_final_message=True,
70
+ add_generation_prompt=False,
71
+ )
72
+
73
+ # Tokenize the prompt and move it to the appropriate device.
74
+ inputs = tokenizer(prompt, return_tensors="pt").to("cuda:0")
75
+
76
+ print("\nGenerating response...\n")
77
+ outputs = model.generate(
78
+ **inputs,
79
+ max_new_tokens=1024,
80
+ temperature=0.5,
81
+ min_p=0.01,
82
+ )
83
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
84
+ print("\nResponse:\n", response)
85
+ ```
86
+
87
+ ## Training Details
88
+
89
+ ### Training Data
90
+
91
+ Trained using open source logical reasoning datasets and a proprietary finance dataset created by AbleCredit.com.
92
+
93
+ ### Training Procedure
94
+
95
+ Trained using deepseek style reinforcement learning using GRPO with rule based rewards.
96
+
97
+ ## Evaluation
98
+
99
+ - Model achieves ~67% score on GSM8K benchmark in a **zero shot** setting (check benchmarking script for more details).
100
+
101
+ ## Model Card Contact
102
+
103
  [contact Harshad Saykhedkar via LinkedIn](https://www.linkedin.com/in/harshadss/)