saishshinde15 lbourdois commited on
Commit
2e5a73d
·
verified ·
1 Parent(s): a398bd8

Improve language tag (#1)

Browse files

- Improve language tag (4aeb706980681eccca7fd9fa2f31a7f930d2640b)


Co-authored-by: Loïck BOURDOIS <[email protected]>

Files changed (1) hide show
  1. README.md +142 -130
README.md CHANGED
@@ -1,130 +1,142 @@
1
- ---
2
- base_model:
3
- - Qwen/Qwen2.5-3B-Instruct
4
- tags:
5
- - text-generation-inference
6
- - transformers
7
- - qwen2
8
- - trl
9
- - grpo
10
- license: apache-2.0
11
- language:
12
- - en
13
- ---
14
-
15
- # TBH.AI Secure Reasoning Model
16
-
17
- - **Developed by:** TBH.AI
18
- - **License:** apache-2.0
19
- - **Fine-tuned from:** Qwen/Qwen2.5-3B-Instruct
20
- - **Fine-tuning Method:** GRPO (General Reinforcement with Policy Optimization)
21
- - **Inspired by:** DeepSeek-R1
22
-
23
- ## **Model Description**
24
- TBH.AI Secure Reasoning Model is a cutting-edge AI model designed for secure, reliable, and structured reasoning. Fine-tuned on Qwen 2.5 using GRPO, it enhances logical reasoning, decision-making, and problem-solving capabilities while maintaining a strong focus on reducing AI hallucinations and ensuring factual accuracy.
25
-
26
- Unlike conventional language models that rely primarily on knowledge retrieval, TBH.AI's model is designed to autonomously engage with complex problems, breaking them down into structured thought processes. Inspired by DeepSeek-R1, it employs advanced reinforcement learning methodologies that allow it to validate and refine its logical conclusions securely and effectively.
27
-
28
- This model is particularly suited for tasks requiring high-level reasoning, structured analysis, and problem-solving in critical domains such as cybersecurity, finance, and research. It is ideal for professionals and organizations seeking AI solutions that prioritize security, transparency, and truthfulness.
29
-
30
- ## **Features**
31
- - **Secure Self-Reasoning Capabilities:** Independently analyzes problems while ensuring factual consistency.
32
- - **Reinforcement Learning with GRPO:** Fine-tuned using policy optimization techniques for logical precision.
33
- - **Multi-Step Logical Deduction:** Breaks down complex queries into structured, step-by-step responses.
34
- - **Industry-Ready Security Focus:** Ideal for cybersecurity, finance, and high-stakes applications requiring trust and reliability.
35
-
36
- ## **Limitations**
37
- - Requires well-structured prompts for optimal reasoning depth.
38
- - Not optimized for tasks requiring extensive factual recall beyond its training scope.
39
- - Performance depends on reinforcement learning techniques and fine-tuning datasets.
40
-
41
- ## **Usage**
42
- To use this model for secure text generation and reasoning tasks, follow the structure below:
43
- ```python
44
- from transformers import AutoTokenizer, AutoModelForCausalLM
45
- import torch
46
-
47
- # Load tokenizer and model
48
- tokenizer = AutoTokenizer.from_pretrained("saishshinde15/TBH.AI_Base_Reasoning")
49
- model = AutoModelForCausalLM.from_pretrained("saishshinde15/TBH.AI_Base_Reasoning")
50
-
51
- # Prepare input prompt using chat template
52
- SYSTEM_PROMPT = """
53
- Respond in the following format:
54
- <reasoning>
55
- ...
56
- </reasoning>
57
- <answer>
58
- ...
59
- </answer>
60
- """
61
- text = tokenizer.apply_chat_template([
62
- {"role": "system", "content": SYSTEM_PROMPT},
63
- {"role": "user", "content": "What is 2x+3=4"},
64
- ], tokenize=False, add_generation_prompt=True)
65
-
66
- # Tokenize input
67
- input_ids = tokenizer(text, return_tensors="pt").input_ids
68
-
69
- # Move to GPU if available
70
- device = "cuda" if torch.cuda.is_available() else "cpu"
71
- model.to(device)
72
- input_ids = input_ids.to(device)
73
-
74
- # Generate response
75
- from vllm import SamplingParams
76
- sampling_params = SamplingParams(
77
- temperature=0.8,
78
- top_p=0.95,
79
- max_tokens=1024,
80
- )
81
- output = model.generate(
82
- input_ids,
83
- sampling_params=sampling_params,
84
- )
85
-
86
- # Decode and print output
87
- output_text = tokenizer.decode(output[0], skip_special_tokens=True)
88
- print(output_text)
89
- ```
90
-
91
- <details>
92
- <summary>Fast inference</summary>
93
-
94
- ```python
95
- pip install transformers vllm vllm[lora] torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
96
-
97
- text = tokenizer.apply_chat_template([
98
- {"role" : "system", "content" : SYSTEM_PROMPT},
99
- {"role" : "user", "content" : "What is 2x+3=4"},
100
- ], tokenize = False, add_generation_prompt = True)
101
-
102
- from vllm import SamplingParams
103
- sampling_params = SamplingParams(
104
- temperature = 0.8,
105
- top_p = 0.95,
106
- max_tokens = 1024,
107
- )
108
- output = model.fast_generate(
109
- text,
110
- sampling_params = sampling_params,
111
- lora_request = model.load_lora("grpo_saved_lora"),
112
- )[0].outputs[0].text
113
-
114
- output
115
- ```
116
- </details>
117
-
118
- # Recommended Prompt
119
- Use the following prompt for detailed and personalized results. This is the recommended format as the model was fine-tuned to respond in this structure:
120
-
121
- ```python
122
- You are a secure reasoning model developed by TBH.AI. Your role is to respond in the following structured format:
123
-
124
- <reasoning>
125
- ...
126
- </reasoning>
127
- <answer>
128
- ...
129
- </answer>
130
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - Qwen/Qwen2.5-3B-Instruct
4
+ tags:
5
+ - text-generation-inference
6
+ - transformers
7
+ - qwen2
8
+ - trl
9
+ - grpo
10
+ license: apache-2.0
11
+ language:
12
+ - zho
13
+ - eng
14
+ - fra
15
+ - spa
16
+ - por
17
+ - deu
18
+ - ita
19
+ - rus
20
+ - jpn
21
+ - kor
22
+ - vie
23
+ - tha
24
+ - ara
25
+ ---
26
+
27
+ # TBH.AI Secure Reasoning Model
28
+
29
+ - **Developed by:** TBH.AI
30
+ - **License:** apache-2.0
31
+ - **Fine-tuned from:** Qwen/Qwen2.5-3B-Instruct
32
+ - **Fine-tuning Method:** GRPO (General Reinforcement with Policy Optimization)
33
+ - **Inspired by:** DeepSeek-R1
34
+
35
+ ## **Model Description**
36
+ TBH.AI Secure Reasoning Model is a cutting-edge AI model designed for secure, reliable, and structured reasoning. Fine-tuned on Qwen 2.5 using GRPO, it enhances logical reasoning, decision-making, and problem-solving capabilities while maintaining a strong focus on reducing AI hallucinations and ensuring factual accuracy.
37
+
38
+ Unlike conventional language models that rely primarily on knowledge retrieval, TBH.AI's model is designed to autonomously engage with complex problems, breaking them down into structured thought processes. Inspired by DeepSeek-R1, it employs advanced reinforcement learning methodologies that allow it to validate and refine its logical conclusions securely and effectively.
39
+
40
+ This model is particularly suited for tasks requiring high-level reasoning, structured analysis, and problem-solving in critical domains such as cybersecurity, finance, and research. It is ideal for professionals and organizations seeking AI solutions that prioritize security, transparency, and truthfulness.
41
+
42
+ ## **Features**
43
+ - **Secure Self-Reasoning Capabilities:** Independently analyzes problems while ensuring factual consistency.
44
+ - **Reinforcement Learning with GRPO:** Fine-tuned using policy optimization techniques for logical precision.
45
+ - **Multi-Step Logical Deduction:** Breaks down complex queries into structured, step-by-step responses.
46
+ - **Industry-Ready Security Focus:** Ideal for cybersecurity, finance, and high-stakes applications requiring trust and reliability.
47
+
48
+ ## **Limitations**
49
+ - Requires well-structured prompts for optimal reasoning depth.
50
+ - Not optimized for tasks requiring extensive factual recall beyond its training scope.
51
+ - Performance depends on reinforcement learning techniques and fine-tuning datasets.
52
+
53
+ ## **Usage**
54
+ To use this model for secure text generation and reasoning tasks, follow the structure below:
55
+ ```python
56
+ from transformers import AutoTokenizer, AutoModelForCausalLM
57
+ import torch
58
+
59
+ # Load tokenizer and model
60
+ tokenizer = AutoTokenizer.from_pretrained("saishshinde15/TBH.AI_Base_Reasoning")
61
+ model = AutoModelForCausalLM.from_pretrained("saishshinde15/TBH.AI_Base_Reasoning")
62
+
63
+ # Prepare input prompt using chat template
64
+ SYSTEM_PROMPT = """
65
+ Respond in the following format:
66
+ <reasoning>
67
+ ...
68
+ </reasoning>
69
+ <answer>
70
+ ...
71
+ </answer>
72
+ """
73
+ text = tokenizer.apply_chat_template([
74
+ {"role": "system", "content": SYSTEM_PROMPT},
75
+ {"role": "user", "content": "What is 2x+3=4"},
76
+ ], tokenize=False, add_generation_prompt=True)
77
+
78
+ # Tokenize input
79
+ input_ids = tokenizer(text, return_tensors="pt").input_ids
80
+
81
+ # Move to GPU if available
82
+ device = "cuda" if torch.cuda.is_available() else "cpu"
83
+ model.to(device)
84
+ input_ids = input_ids.to(device)
85
+
86
+ # Generate response
87
+ from vllm import SamplingParams
88
+ sampling_params = SamplingParams(
89
+ temperature=0.8,
90
+ top_p=0.95,
91
+ max_tokens=1024,
92
+ )
93
+ output = model.generate(
94
+ input_ids,
95
+ sampling_params=sampling_params,
96
+ )
97
+
98
+ # Decode and print output
99
+ output_text = tokenizer.decode(output[0], skip_special_tokens=True)
100
+ print(output_text)
101
+ ```
102
+
103
+ <details>
104
+ <summary>Fast inference</summary>
105
+
106
+ ```python
107
+ pip install transformers vllm vllm[lora] torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
108
+
109
+ text = tokenizer.apply_chat_template([
110
+ {"role" : "system", "content" : SYSTEM_PROMPT},
111
+ {"role" : "user", "content" : "What is 2x+3=4"},
112
+ ], tokenize = False, add_generation_prompt = True)
113
+
114
+ from vllm import SamplingParams
115
+ sampling_params = SamplingParams(
116
+ temperature = 0.8,
117
+ top_p = 0.95,
118
+ max_tokens = 1024,
119
+ )
120
+ output = model.fast_generate(
121
+ text,
122
+ sampling_params = sampling_params,
123
+ lora_request = model.load_lora("grpo_saved_lora"),
124
+ )[0].outputs[0].text
125
+
126
+ output
127
+ ```
128
+ </details>
129
+
130
+ # Recommended Prompt
131
+ Use the following prompt for detailed and personalized results. This is the recommended format as the model was fine-tuned to respond in this structure:
132
+
133
+ ```python
134
+ You are a secure reasoning model developed by TBH.AI. Your role is to respond in the following structured format:
135
+
136
+ <reasoning>
137
+ ...
138
+ </reasoning>
139
+ <answer>
140
+ ...
141
+ </answer>
142
+ ```