|
--- |
|
base_model: |
|
- Qwen/Qwen2.5-3B-Instruct |
|
tags: |
|
- text-generation-inference |
|
- transformers |
|
- qwen2 |
|
- trl |
|
- grpo |
|
license: apache-2.0 |
|
language: |
|
- zho |
|
- eng |
|
- fra |
|
- spa |
|
- por |
|
- deu |
|
- ita |
|
- rus |
|
- jpn |
|
- kor |
|
- vie |
|
- tha |
|
- ara |
|
--- |
|
|
|
# Clyrai Secure Reasoning Model (Formerly known as TBH.AI_Base_Reasoning) |
|
|
|
- **Developed by:** Clyrai |
|
- **License:** apache-2.0 |
|
- **Fine-tuned from:** Qwen/Qwen2.5-3B-Instruct |
|
- **Fine-tuning Method:** GRPO (General Reinforcement with Policy Optimization) |
|
- **Inspired by:** DeepSeek-R1 |
|
|
|
## **Model Description** |
|
Clyrai Secure Reasoning Model is a cutting-edge AI model designed for secure, reliable, and structured reasoning. Fine-tuned on Qwen 2.5 using GRPO, it enhances logical reasoning, decision-making, and problem-solving capabilities while maintaining a strong focus on reducing AI hallucinations and ensuring factual accuracy. |
|
|
|
Unlike conventional language models that rely primarily on knowledge retrieval, Clyrai's model is designed to autonomously engage with complex problems, breaking them down into structured thought processes. Inspired by DeepSeek-R1, it employs advanced reinforcement learning methodologies that allow it to validate and refine its logical conclusions securely and effectively. |
|
|
|
This model is particularly suited for tasks requiring high-level reasoning, structured analysis, and problem-solving in critical domains such as cybersecurity, finance, and research. It is ideal for professionals and organizations seeking AI solutions that prioritize security, transparency, and truthfulness. |
|
|
|
## **Features** |
|
- **Secure Self-Reasoning Capabilities:** Independently analyzes problems while ensuring factual consistency. |
|
- **Reinforcement Learning with GRPO:** Fine-tuned using policy optimization techniques for logical precision. |
|
- **Multi-Step Logical Deduction:** Breaks down complex queries into structured, step-by-step responses. |
|
- **Industry-Ready Security Focus:** Ideal for cybersecurity, finance, and high-stakes applications requiring trust and reliability. |
|
|
|
## **Limitations** |
|
- Requires well-structured prompts for optimal reasoning depth. |
|
- Not optimized for tasks requiring extensive factual recall beyond its training scope. |
|
- Performance depends on reinforcement learning techniques and fine-tuning datasets. |
|
|
|
## **Usage** |
|
To use this model for secure text generation and reasoning tasks, follow the structure below: |
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
import torch |
|
|
|
# Load tokenizer and model |
|
tokenizer = AutoTokenizer.from_pretrained("saishshinde15/Clyrai_Base_Reasoning") |
|
model = AutoModelForCausalLM.from_pretrained("saishshinde15/Clyrai_Base_Reasoning") |
|
|
|
# Prepare input prompt using chat template |
|
SYSTEM_PROMPT = """ |
|
Respond in the following format: |
|
<reasoning> |
|
... |
|
</reasoning> |
|
<answer> |
|
... |
|
</answer> |
|
""" |
|
text = tokenizer.apply_chat_template([ |
|
{"role": "system", "content": SYSTEM_PROMPT}, |
|
{"role": "user", "content": "What is 2x+3=4"}, |
|
], tokenize=False, add_generation_prompt=True) |
|
|
|
# Tokenize input |
|
input_ids = tokenizer(text, return_tensors="pt").input_ids |
|
|
|
# Move to GPU if available |
|
device = "cuda" if torch.cuda.is_available() else "cpu" |
|
model.to(device) |
|
input_ids = input_ids.to(device) |
|
|
|
# Generate response |
|
from vllm import SamplingParams |
|
sampling_params = SamplingParams( |
|
temperature=0.8, |
|
top_p=0.95, |
|
max_tokens=1024, |
|
) |
|
output = model.generate( |
|
input_ids, |
|
sampling_params=sampling_params, |
|
) |
|
|
|
# Decode and print output |
|
output_text = tokenizer.decode(output[0], skip_special_tokens=True) |
|
print(output_text) |
|
``` |
|
|
|
<details> |
|
<summary>Fast inference</summary> |
|
|
|
```python |
|
pip install transformers vllm vllm[lora] torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 |
|
|
|
text = tokenizer.apply_chat_template([ |
|
{"role" : "system", "content" : SYSTEM_PROMPT}, |
|
{"role" : "user", "content" : "What is 2x+3=4"}, |
|
], tokenize = False, add_generation_prompt = True) |
|
|
|
from vllm import SamplingParams |
|
sampling_params = SamplingParams( |
|
temperature = 0.8, |
|
top_p = 0.95, |
|
max_tokens = 1024, |
|
) |
|
output = model.fast_generate( |
|
text, |
|
sampling_params = sampling_params, |
|
lora_request = model.load_lora("grpo_saved_lora"), |
|
)[0].outputs[0].text |
|
|
|
output |
|
``` |
|
</details> |
|
|
|
# Recommended Prompt |
|
Use the following prompt for detailed and personalized results. This is the recommended format as the model was fine-tuned to respond in this structure: |
|
|
|
```python |
|
You are a secure reasoning model developed by TBH.AI. Your role is to respond in the following structured format: |
|
|
|
<reasoning> |
|
... |
|
</reasoning> |
|
<answer> |
|
... |
|
</answer> |
|
``` |
|
|