metadata

base_model:
  - TachyHealth/Gazal-R1-32B-sft-merged-preview
datasets:
  - TachyHealth/medical_grpo
  - TachyHealth/structured_medical
library_name: transformers
license: apache-2.0
license_link: https://huggingface.co/TachyHealth/Gazal-R1-32B-GRPO-preview/blob/main/LICENSE
pipeline_tag: text-generation
tags:
  - gazal-r1
  - grpo
  - qwen3
  - conversational
  - medical
  - clinical
  - healthcare
  - reasoning

Gazal-R1-32B: Medical Reasoning Language Model

The model was presented in the paper Gazal-R1: Achieving State-of-the-Art Medical Reasoning with Parameter-Efficient Two-Stage Training.

Model Highlights

Gazal-R1 is a state-of-the-art 32-billion-parameter language model specifically designed for medical reasoning and clinical decision-making. Built upon Qwen 3 32B, Gazal-R1 demonstrates that strategic training can enable mid-sized models to outperform significantly larger counterparts in specialized medical domains.

Key features include:

🔬 Medical Expertise: Specialized training on 107,033 synthetic medical reasoning examples covering diagnostic reasoning, treatment planning, decision-making under uncertainty, and prognostic assessment
🧠 Transparent Reasoning: Structured clinical thinking with step-by-step explanations in <think></think> tags, following established clinical reasoning frameworks
📊 State-of-the-Art Performance: Achieves 87.1% on MedQA, 81.6% on MMLU Pro (Medical), and 79.6% on PubMedQA, surpassing models up to 12× larger
⚡ Parameter Efficiency: Advanced training techniques including Weight-Decomposed Low-Rank Adaptation (DoRA) and Rank-Stabilized LoRA (rsLoRA)
🎯 Alignment Optimization: Refined through Group Relative Policy Optimization (GRPO) with sophisticated multi-component reward systems
🌍 Medical Knowledge: Comprehensive understanding across multiple medical specialties and clinical scenarios

Model Overview

Gazal-R1-32B has the following characteristics:

Type: Causal Language Model (Medical Reasoning Specialist)
Base Model: Qwen 3 32B
Training Stages: Two-stage pipeline (Supervised Fine-Tuning + Reinforcement Learning)
Number of Parameters: 32.8B
Number of Parameters (Non-Embedding): 31.2B
Context Length: 32,768 tokens natively, extensible to 131,072 with YaRN
Training Data: 107,033 synthetic medical reasoning examples + MedReason dataset (32,682 examples)
Fine-tuning Method: DoRA + rsLoRA (Parameter-Efficient Fine-Tuning)
Alignment: Group Relative Policy Optimization (GRPO)

For detailed methodology, training insights, and comprehensive evaluation, please refer to our technical report.

Performance Results

Gazal-R1 achieves exceptional performance across standard medical benchmarks:

Model	Size	MMLU Pro (Medical)	MedMCQA	MedQA	PubMedQA
Gazal-R1 (Final)	32B	81.6	71.9	87.1	79.6
Gazal-R1 (SFT-only)	32B	79.3	72.3	86.9	77.6
Llama 3.1 405B Instruct	405B	70.2	75.8	81.9	74.6
Qwen 2.5 72B Instruct	72B	72.1	66.2	72.7	71.7
Med42-Llama3.1-70B	70B	66.1	72.4	80.4	77.6
Llama 3.1 70B Instruct	70B	74.5	72.5	78.4	78.5
QwQ 32B	32B	70.1	65.6	72.3	73.7
Qwen 3 32B	32B	78.4	71.6	84.4	76.7

Key Achievements:

🥇 Highest scores on MMLU Pro (Medical), MedQA, and PubMedQA
📈 Significant improvements from GRPO training (+2.3% on MMLU Pro, +2.0% on PubMedQA)
🚀 Outperforms models up to 12× larger (Llama 3.1 405B) on medical reasoning tasks

Quickstart

Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "TachyHealth/Gazal-R1-32B-GRPO-preview"

# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# Medical reasoning prompt
prompt = """A 65-year-old male presents with chest pain, shortness of breath, and elevated troponin levels. 
ECG shows ST-segment elevation in leads II, III, and aVF. What is the most likely diagnosis and immediate management?"""

messages = [
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# Generate response with medical reasoning
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=2048,
    temperature=0.7,
    top_p=0.8,
    top_k=20
)

output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
response = tokenizer.decode(output_ids, skip_special_tokens=True)

print("Medical Assessment:", response)

Structured Medical Reasoning Format

Gazal-R1 is trained to provide structured medical reasoning in the following format:

<think>
Step 1: Analyze presenting symptoms - chest pain, dyspnea, troponin elevation
Step 2: Interpret ECG findings - ST elevation in inferior leads
Step 3: Consider differential diagnoses - STEMI vs NSTEMI vs unstable angina
Step 4: Identify culprit vessel - likely RCA given inferior lead changes
Step 5: Assess urgency - emergent intervention required
Step 6: Plan immediate management - dual antiplatelet, anticoagulation, cath lab
</think>

## Clinical Assessment

**Primary Diagnosis**: ST-Elevation Myocardial Infarction (STEMI), inferior wall

**Reasoning**: The combination of chest pain, elevated troponin, and ST-elevation in leads II, III, and aVF is pathognomonic for inferior STEMI, likely involving the right coronary artery (RCA).

**Immediate Management**:
1. **Reperfusion therapy**: Emergency cardiac catheterization with primary PCI
2. **Antiplatelet therapy**: Aspirin 325mg + P2Y12 inhibitor (clopidogrel/ticagrelor)
3. **Anticoagulation**: Heparin or bivalirudin
4. **Supportive care**: O2 if hypoxic, nitroglycerin for pain (avoid if hypotensive)

**Follow-up**: Post-PCI monitoring, echocardiogram, cardiac rehabilitation referral

Training Methodology

Stage 1: Supervised Fine-Tuning (SFT)

Dataset: 107,033 synthetic medical reasoning examples + MedReason dataset
Techniques: DoRA + rsLoRA with rank 256
Focus: Structured clinical reasoning across diagnostic, therapeutic, and prognostic scenarios

Stage 2: Group Relative Policy Optimization (GRPO)

Algorithm: Value-function-free reinforcement learning
Dataset: UltraMedical subset (32K medical MCQs)
Rewards: Multi-component system (accuracy, format, length control, repetition penalty)
Improvements: Enhanced reasoning quality and format adherence

Model Capabilities

Clinical Reasoning Types

Diagnostic Reasoning: Systematic symptom analysis → differential diagnosis
Treatment Planning: Evidence-based therapy selection with patient-specific factors
Decision-Making Under Uncertainty: Risk assessment and clinical judgment
Prognostic Assessment: Outcome prediction based on clinical evidence

Medical Specialties Covered

Internal Medicine
Emergency Medicine
Cardiology
Pulmonology
Infectious Disease
Pharmacology
Pathophysiology
Clinical Laboratory Medicine

Limitations and Important Disclaimers

⚠️ Critical Safety Information

NOT A MEDICAL DEVICE: Gazal-R1 is a research model and is NOT intended for direct clinical use, diagnosis, or treatment planning
REQUIRES PROFESSIONAL VERIFICATION: All outputs must be independently verified by qualified medical professionals
NO REAL-TIME UPDATES: Knowledge is static and does not reflect the latest medical research or guidelines

Technical Limitations

Knowledge Cutoff: Training data reflects medical knowledge up to the training date
Hallucination Risk: May generate plausible-sounding but factually incorrect information
Evaluation Scope: Primarily evaluated on multiple-choice questions; real-world clinical scenarios may differ
Regional Bias: Training data may contain geographical or demographic biases

Ethical Considerations

Professional Responsibility: Final medical decisions must always rest with qualified healthcare providers
Accountability: Users assume responsibility for verifying and appropriately applying model outputs
Patient Safety: Never use for emergency medical situations or time-critical decisions

Use Cases

Research and Education

Medical education and training
Clinical reasoning research
Medical knowledge assessment
Academic medical writing assistance

Professional Support (With Supervision)

Literature review assistance
Clinical case analysis support
Medical documentation aid
Differential diagnosis exploration

NOT Suitable For

Direct patient care
Emergency medical decisions
Replacing clinical judgment
Unsupervised medical advice

Citation

If you find Gazal-R1 helpful in your research, please cite our work:

@article{gazal-r1-2025,
    title={Gazal-R1: Achieving State-of-the-Art Medical Reasoning with Parameter-Efficient Two-Stage Training},
    author={Ahmed M. Adly and Mostafa Samy and Amr Fawzy},
    journal={arXiv preprint arXiv:2506.21594},
    year={2025},
    url={https://arxiv.org/abs/2506.21594}
}

Model Access

Model Weights: Available on Hugging Face Hub
Datasets: Training datasets available at TachyHealth/structured_medical and TachyHealth/medical_grpo

License

This model is released under the Apache 2.0 License. Please review the license terms before use.

Contact

For questions about Gazal-R1, please contact:

Research Team: TachyHealth
Website: https://tachyhealth.com/
Gazal Platform: Gazal.ai

Developed by TachyHealth Research Team. This model represents a significant advancement in medical AI reasoning while emphasizing the critical importance of professional medical oversight.