TethysAI Vortex Reasoning

Introduction

TethysAI Vortex Reasoning is an experimental model that advances the structured reasoning capabilities pioneered by TethysAI Base Reasoning. While the Base Reasoning model utilized Generalized Reinforced Policy Optimization (GRPO) to enhance step-by-step logical thought processes similar to DeepSeek-R1, this model takes a different approach—eliminating GRPO and instead relying on high-end Supervised Fine-Tuning (SFT) techniques.

The core objective was to investigate whether deep reasoning and self-questioning behavior could emerge purely through SFT on high-quality datasets. The results were highly promising: the model successfully questions itself internally, improves reasoning depth, and consistently generates structured, logical responses.


Key Features

1️⃣ Advanced Reasoning Without GRPO

This model does not rely on GRPO yet achieves similar self-reflective thought processes, proving that structured reasoning can be induced through high-quality SFT alone.

2️⃣ Self-Questioning and Iterative Thinking

The model actively asks itself intermediate questions before answering, mimicking the deep reflection-based thought process of models like DeepSeek-R1. This leads to more reliable and well-structured responses.

3️⃣ High-Quality SFT on a Curated Dataset

To compensate for the lack of reinforcement learning, we used an extensive dataset tailored for deep reasoning. This dataset includes:

  • Mathematical proofs & logical puzzles
  • Complex multi-step problem-solving tasks
  • Philosophical and ethical reasoning
  • Scientific hypothesis evaluation

4️⃣ Implicit Use of <think> and <answer> Tokens

The model internally uses special reasoning markers (<think> and <answer>) to structure its responses, though these may not always be visible in the final output. This ensures a consistent and methodical approach to answering questions.

5️⃣ Part of the TethysAI Vortex Family

This model belongs to the TethysAI Vortex series, a collection of fine-tuned models pushing the boundaries of SFT-based reasoning without reinforcement learning.


Breakthrough Insights

Feature Base Reasoning (GRPO) ✅ Vortex Reasoning (SFT-Only) ✅
Structured Thought Process ✅ Yes (GRPO) ✅ Yes (SFT)
Self-Reflection & Questioning ✅ Strong ✅ Equally Strong
GRPO-Free Optimization ❌ No ✅ Achieved via SFT
Step-by-Step Problem Solving ✅ Yes ✅ Yes
Use of <think> and <answer> ✅ Explicit ✅ Implicit (Internal Use)

Key Takeaway: This experiment confirms that reinforcement learning is not the only pathway to advanced reasoning capabilities—with the right dataset and SFT strategies, models can self-reflect and logically deduce answers in a structured manner.


How to Use

Running with Transformers

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model & tokenizer
model_name = "saishshinde15/TethysAI_Vortex_Reasoning"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name).to("cuda")

# Prepare input prompt
messages = [
    {"role": "system", "content": "You are an advanced AI assistant. Provide answers in a clear, step-by-step manner."},
    {"role": "user", "content": "If x + 3 = 10, what is x?"}
]

# Apply chat template and tokenize
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to("cuda")

# Generate response
outputs = model.generate(input_ids, max_new_tokens=512)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(response)
Downloads last month
55
Safetensors
Model size
3.09B params
Tensor type
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for saishshinde15/TethysAI_Vortex_Reasoning

Base model

Qwen/Qwen2.5-3B
Finetuned
(4)
this model

Collection including saishshinde15/TethysAI_Vortex_Reasoning