TBH.AI Vortex Reasoning

Developed by: TBH.AI
License: apache-2.0
Fine-tuned from: saishshinde15/TBH.AI_Base_Reasoning
Category: Experimental, Research

Introduction

TethysAI Vortex Reasoning is an experimental model that advances the structured reasoning capabilities pioneered by TBH.AI Base Reasoning. While the Base Reasoning model utilized Generalized Reinforced Policy Optimization (GRPO) to enhance step-by-step logical thought processes similar to DeepSeek-R1, this model takes a different approach—eliminating GRPO and instead relying on high-end Supervised Fine-Tuning (SFT) techniques.

The core objective was to investigate whether deep reasoning and self-questioning behavior could emerge purely through SFT on high-quality datasets. The results were highly promising: the model successfully questions itself internally, improves reasoning depth, and consistently generates structured, logical responses.

Key Features

1️⃣ Advanced Reasoning Without GRPO

This model does not rely on GRPO yet achieves similar self-reflective thought processes, proving that structured reasoning can be induced through high-quality SFT alone.

2️⃣ Self-Questioning and Iterative Thinking

The model actively asks itself intermediate questions before answering, mimicking the deep reflection-based thought process of models like DeepSeek-R1. This leads to more reliable and well-structured responses.

3️⃣ High-Quality SFT on a Curated Dataset

To compensate for the lack of reinforcement learning, we used an extensive dataset tailored for deep reasoning. This dataset includes:

Mathematical proofs & logical puzzles
Complex multi-step problem-solving tasks
Philosophical and ethical reasoning
Scientific hypothesis evaluation

4️⃣ Implicit Use of `<think>` and `<answer>` Tokens

The model internally uses special reasoning markers (<think> and <answer>) to structure its responses, though these may not always be visible in the final output. This ensures a consistent and methodical approach to answering questions.

5️⃣ Part of the TethysAI Vortex Family

This model belongs to the TBH.AI Vortex series, a collection of fine-tuned models pushing the boundaries of SFT-based reasoning without reinforcement learning.

Breakthrough Insights

Feature	Base Reasoning (GRPO) ✅	Vortex Reasoning (SFT-Only) ✅
Structured Thought Process	✅ Yes (GRPO)	✅ Yes (SFT)
Self-Reflection & Questioning	✅ Strong	✅ Equally Strong
GRPO-Free Optimization	❌ No	✅ Achieved via SFT
Step-by-Step Problem Solving	✅ Yes	✅ Yes
Use of `<think>` and `<answer>`	✅ Explicit	✅ Implicit (Internal Use)

Key Takeaway: This experiment confirms that reinforcement learning is not the only pathway to advanced reasoning capabilities—with the right dataset and SFT strategies, models can self-reflect and logically deduce answers in a structured manner.

How to Use

Running with Transformers

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model & tokenizer
model_name = "saishshinde15/TBH.AI_Vortex_Reasoning"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name).to("cuda")

# Prepare input prompt
messages = [
    {"role": "system", "content": "You are an advanced AI assistant. Provide answers in a clear, step-by-step manner."},
    {"role": "user", "content": "If x + 3 = 10, what is x?"}
]

# Apply chat template and tokenize
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to("cuda")

# Generate response
outputs = model.generate(input_ids, max_new_tokens=512)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(response)

saishshinde15
/

TBH.AI_Vortex_Reasoning

TBH.AI Vortex Reasoning

Introduction

Key Features

1️⃣ Advanced Reasoning Without GRPO

2️⃣ Self-Questioning and Iterative Thinking

3️⃣ High-Quality SFT on a Curated Dataset

4️⃣ Implicit Use of `<think>` and `<answer>` Tokens

5️⃣ Part of the TethysAI Vortex Family

Breakthrough Insights

How to Use

Running with Transformers

Model tree for saishshinde15/TBH.AI_Vortex_Reasoning

Collection including saishshinde15/TBH.AI_Vortex_Reasoning

Experimental Model

TBH.AI Vortex Reasoning

Introduction

Key Features

1️⃣ Advanced Reasoning Without GRPO

2️⃣ Self-Questioning and Iterative Thinking

3️⃣ High-Quality SFT on a Curated Dataset

4️⃣ Implicit Use of <think> and <answer> Tokens

5️⃣ Part of the TethysAI Vortex Family

Breakthrough Insights

How to Use

Running with Transformers

Model tree for saishshinde15/TBH.AI_Vortex_Reasoning

Collection including saishshinde15/TBH.AI_Vortex_Reasoning

4️⃣ Implicit Use of `<think>` and `<answer>` Tokens