saishshinde15
/

TBH.AI_Base_Reasoning

@@ -1,130 +1,142 @@
----
-base_model:
-- Qwen/Qwen2.5-3B-Instruct
-tags:
-- text-generation-inference
-- transformers
-- qwen2
-- trl
-- grpo
-license: apache-2.0
-language:
-- en
----
-# TBH.AI Secure Reasoning Model
-- **Developed by:** TBH.AI
-- **License:** apache-2.0
-- **Fine-tuned from:** Qwen/Qwen2.5-3B-Instruct
-- **Fine-tuning Method:** GRPO (General Reinforcement with Policy Optimization)
-- **Inspired by:** DeepSeek-R1
-## **Model Description**
-TBH.AI Secure Reasoning Model is a cutting-edge AI model designed for secure, reliable, and structured reasoning. Fine-tuned on Qwen 2.5 using GRPO, it enhances logical reasoning, decision-making, and problem-solving capabilities while maintaining a strong focus on reducing AI hallucinations and ensuring factual accuracy.
-Unlike conventional language models that rely primarily on knowledge retrieval, TBH.AI's model is designed to autonomously engage with complex problems, breaking them down into structured thought processes. Inspired by DeepSeek-R1, it employs advanced reinforcement learning methodologies that allow it to validate and refine its logical conclusions securely and effectively.
-This model is particularly suited for tasks requiring high-level reasoning, structured analysis, and problem-solving in critical domains such as cybersecurity, finance, and research. It is ideal for professionals and organizations seeking AI solutions that prioritize security, transparency, and truthfulness.
-## **Features**
-- **Secure Self-Reasoning Capabilities:** Independently analyzes problems while ensuring factual consistency.
-- **Reinforcement Learning with GRPO:** Fine-tuned using policy optimization techniques for logical precision.
-- **Multi-Step Logical Deduction:** Breaks down complex queries into structured, step-by-step responses.
-- **Industry-Ready Security Focus:** Ideal for cybersecurity, finance, and high-stakes applications requiring trust and reliability.
-## **Limitations**
-- Requires well-structured prompts for optimal reasoning depth.
-- Not optimized for tasks requiring extensive factual recall beyond its training scope.
-- Performance depends on reinforcement learning techniques and fine-tuning datasets.
-## **Usage**
-To use this model for secure text generation and reasoning tasks, follow the structure below:
-```python
-from transformers import AutoTokenizer, AutoModelForCausalLM
-import torch
-# Load tokenizer and model
-tokenizer = AutoTokenizer.from_pretrained("saishshinde15/TBH.AI_Base_Reasoning")
-model = AutoModelForCausalLM.from_pretrained("saishshinde15/TBH.AI_Base_Reasoning")
-# Prepare input prompt using chat template
-SYSTEM_PROMPT = """
-Respond in the following format:
-<reasoning>
-...
-</reasoning>
-<answer>
-...
-</answer>
-"""
-text = tokenizer.apply_chat_template([
-    {"role": "system", "content": SYSTEM_PROMPT},
-    {"role": "user", "content": "What is 2x+3=4"},
-], tokenize=False, add_generation_prompt=True)
-# Tokenize input
-input_ids = tokenizer(text, return_tensors="pt").input_ids
-# Move to GPU if available
-device = "cuda" if torch.cuda.is_available() else "cpu"
-model.to(device)
-input_ids = input_ids.to(device)
-# Generate response
-from vllm import SamplingParams
-sampling_params = SamplingParams(
-    temperature=0.8,
-    top_p=0.95,
-    max_tokens=1024,
-)
-output = model.generate(
-    input_ids,
-    sampling_params=sampling_params,
-)
-# Decode and print output
-output_text = tokenizer.decode(output[0], skip_special_tokens=True)
-print(output_text)
-```
-<details>
-<summary>Fast inference</summary>
-```python
-pip install transformers vllm vllm[lora] torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
-text = tokenizer.apply_chat_template([
-    {"role" : "system", "content" : SYSTEM_PROMPT},
-    {"role" : "user", "content" : "What is 2x+3=4"},
-], tokenize = False, add_generation_prompt = True)
-from vllm import SamplingParams
-sampling_params = SamplingParams(
-    temperature = 0.8,
-    top_p = 0.95,
-    max_tokens = 1024,
-)
-output = model.fast_generate(
-    text,
-    sampling_params = sampling_params,
-    lora_request = model.load_lora("grpo_saved_lora"),
-)[0].outputs[0].text
-output
-```
-</details>
-# Recommended Prompt
-Use the following prompt for detailed and personalized results. This is the recommended format as the model was fine-tuned to respond in this structure:
-```python
-You are a secure reasoning model developed by TBH.AI. Your role is to respond in the following structured format:
-<reasoning>
-...
-</reasoning>
-<answer>
-...
-</answer>
-```

+---
+base_model:
+- Qwen/Qwen2.5-3B-Instruct
+tags:
+- text-generation-inference
+- transformers
+- qwen2
+- trl
+- grpo
+license: apache-2.0
+language:
+- zho
+- eng
+- fra
+- spa
+- por
+- deu
+- ita
+- rus
+- jpn
+- kor
+- vie
+- tha
+- ara
+---
+# TBH.AI Secure Reasoning Model
+- **Developed by:** TBH.AI
+- **License:** apache-2.0
+- **Fine-tuned from:** Qwen/Qwen2.5-3B-Instruct
+- **Fine-tuning Method:** GRPO (General Reinforcement with Policy Optimization)
+- **Inspired by:** DeepSeek-R1
+## **Model Description**
+TBH.AI Secure Reasoning Model is a cutting-edge AI model designed for secure, reliable, and structured reasoning. Fine-tuned on Qwen 2.5 using GRPO, it enhances logical reasoning, decision-making, and problem-solving capabilities while maintaining a strong focus on reducing AI hallucinations and ensuring factual accuracy.
+Unlike conventional language models that rely primarily on knowledge retrieval, TBH.AI's model is designed to autonomously engage with complex problems, breaking them down into structured thought processes. Inspired by DeepSeek-R1, it employs advanced reinforcement learning methodologies that allow it to validate and refine its logical conclusions securely and effectively.
+This model is particularly suited for tasks requiring high-level reasoning, structured analysis, and problem-solving in critical domains such as cybersecurity, finance, and research. It is ideal for professionals and organizations seeking AI solutions that prioritize security, transparency, and truthfulness.
+## **Features**
+- **Secure Self-Reasoning Capabilities:** Independently analyzes problems while ensuring factual consistency.
+- **Reinforcement Learning with GRPO:** Fine-tuned using policy optimization techniques for logical precision.
+- **Multi-Step Logical Deduction:** Breaks down complex queries into structured, step-by-step responses.
+- **Industry-Ready Security Focus:** Ideal for cybersecurity, finance, and high-stakes applications requiring trust and reliability.
+## **Limitations**
+- Requires well-structured prompts for optimal reasoning depth.
+- Not optimized for tasks requiring extensive factual recall beyond its training scope.
+- Performance depends on reinforcement learning techniques and fine-tuning datasets.
+## **Usage**
+To use this model for secure text generation and reasoning tasks, follow the structure below:
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import torch
+# Load tokenizer and model
+tokenizer = AutoTokenizer.from_pretrained("saishshinde15/TBH.AI_Base_Reasoning")
+model = AutoModelForCausalLM.from_pretrained("saishshinde15/TBH.AI_Base_Reasoning")
+# Prepare input prompt using chat template
+SYSTEM_PROMPT = """
+Respond in the following format:
+<reasoning>
+...
+</reasoning>
+<answer>
+...
+</answer>
+"""
+text = tokenizer.apply_chat_template([
+    {"role": "system", "content": SYSTEM_PROMPT},
+    {"role": "user", "content": "What is 2x+3=4"},
+], tokenize=False, add_generation_prompt=True)
+# Tokenize input
+input_ids = tokenizer(text, return_tensors="pt").input_ids
+# Move to GPU if available
+device = "cuda" if torch.cuda.is_available() else "cpu"
+model.to(device)
+input_ids = input_ids.to(device)
+# Generate response
+from vllm import SamplingParams
+sampling_params = SamplingParams(
+    temperature=0.8,
+    top_p=0.95,
+    max_tokens=1024,
+)
+output = model.generate(
+    input_ids,
+    sampling_params=sampling_params,
+)
+# Decode and print output
+output_text = tokenizer.decode(output[0], skip_special_tokens=True)
+print(output_text)
+```
+<details>
+<summary>Fast inference</summary>
+```python
+pip install transformers vllm vllm[lora] torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
+text = tokenizer.apply_chat_template([
+    {"role" : "system", "content" : SYSTEM_PROMPT},
+    {"role" : "user", "content" : "What is 2x+3=4"},
+], tokenize = False, add_generation_prompt = True)
+from vllm import SamplingParams
+sampling_params = SamplingParams(
+    temperature = 0.8,
+    top_p = 0.95,
+    max_tokens = 1024,
+)
+output = model.fast_generate(
+    text,
+    sampling_params = sampling_params,
+    lora_request = model.load_lora("grpo_saved_lora"),
+)[0].outputs[0].text
+output
+```
+</details>
+# Recommended Prompt
+Use the following prompt for detailed and personalized results. This is the recommended format as the model was fine-tuned to respond in this structure:
+```python
+You are a secure reasoning model developed by TBH.AI. Your role is to respond in the following structured format:
+<reasoning>
+...
+</reasoning>
+<answer>
+...
+</answer>
+```