---
base_model:
- TachyHealth/Gazal-R1-32B-sft-merged-preview
datasets:
- TachyHealth/medical_grpo
- TachyHealth/structured_medical
library_name: transformers
license: apache-2.0
license_link: https://huggingface.co/TachyHealth/Gazal-R1-32B-GRPO-preview/blob/main/LICENSE
pipeline_tag: text-generation
tags:
- gazal-r1
- grpo
- qwen3
- conversational
- medical
- clinical
- healthcare
- reasoning
---

# Gazal-R1-32B: Medical Reasoning Language Model

The model was presented in the paper [Gazal-R1: Achieving State-of-the-Art Medical Reasoning with Parameter-Efficient Two-Stage Training](https://huggingface.co/papers/2506.21594).

<a href="https://gazal.ai/" target="_blank" style="margin: 0px;">
    <img alt="Gazal AI" src="./logo.png" style=" width: 70%;" />
</a>


## Model Highlights

Gazal-R1 is a state-of-the-art 32-billion-parameter language model specifically designed for medical reasoning and clinical decision-making. Built upon Qwen 3 32B, Gazal-R1 demonstrates that strategic training can enable mid-sized models to outperform significantly larger counterparts in specialized medical domains.

Key features include:

- **🔬 Medical Expertise**: Specialized training on 107,033 synthetic medical reasoning examples covering diagnostic reasoning, treatment planning, decision-making under uncertainty, and prognostic assessment
- **🧠 Transparent Reasoning**: Structured clinical thinking with step-by-step explanations in `<think></think>` tags, following established clinical reasoning frameworks
- **📊 State-of-the-Art Performance**: Achieves 87.1% on MedQA, 81.6% on MMLU Pro (Medical), and 79.6% on PubMedQA, surpassing models up to 12× larger
- **⚡ Parameter Efficiency**: Advanced training techniques including Weight-Decomposed Low-Rank Adaptation (DoRA) and Rank-Stabilized LoRA (rsLoRA)
- **🎯 Alignment Optimization**: Refined through Group Relative Policy Optimization (GRPO) with sophisticated multi-component reward systems
- **🌍 Medical Knowledge**: Comprehensive understanding across multiple medical specialties and clinical scenarios

## Model Overview

**Gazal-R1-32B** has the following characteristics:
- **Type**: Causal Language Model (Medical Reasoning Specialist)
- **Base Model**: Qwen 3 32B
- **Training Stages**: Two-stage pipeline (Supervised Fine-Tuning + Reinforcement Learning)
- **Number of Parameters**: 32.8B
- **Number of Parameters (Non-Embedding)**: 31.2B
- **Context Length**: 32,768 tokens natively, extensible to 131,072 with YaRN
- **Training Data**: 107,033 synthetic medical reasoning examples + [MedReason dataset](https://huggingface.co/datasets/UCSC-VLAA/MedReason) (32,682 examples)
- **Fine-tuning Method**: DoRA + rsLoRA (Parameter-Efficient Fine-Tuning)
- **Alignment**: Group Relative Policy Optimization (GRPO)

For detailed methodology, training insights, and comprehensive evaluation, please refer to our [technical report](https://arxiv.org/abs/2506.21594).

## Performance Results

Gazal-R1 achieves exceptional performance across standard medical benchmarks:

| Model | Size | MMLU Pro (Medical) | MedMCQA | MedQA | PubMedQA |
|-------|------|-------------------|---------|-------|----------|
| **Gazal-R1 (Final)** | **32B** | **81.6** | **71.9** | **87.1** | **79.6** |
| [Gazal-R1 (SFT-only)](https://huggingface.co/TachyHealth/Gazal-R1-32B-sft-merged-preview) | 32B | 79.3 | 72.3 | 86.9 | 77.6 |
| Llama 3.1 405B Instruct | 405B | 70.2 | 75.8 | 81.9 | 74.6 |
| Qwen 2.5 72B Instruct | 72B | 72.1 | 66.2 | 72.7 | 71.7 |
| Med42-Llama3.1-70B | 70B | 66.1 | 72.4 | 80.4 | 77.6 |
| Llama 3.1 70B Instruct | 70B | 74.5 | 72.5 | 78.4 | 78.5 |
| QwQ 32B | 32B | 70.1 | 65.6 | 72.3 | 73.7 |
| Qwen 3 32B | 32B | 78.4 | 71.6 | 84.4 | 76.7 |

**Key Achievements:**
- 🥇 Highest scores on MMLU Pro (Medical), MedQA, and PubMedQA
- 📈 Significant improvements from GRPO training (+2.3% on MMLU Pro, +2.0% on PubMedQA)
- 🚀 Outperforms models up to 12× larger (Llama 3.1 405B) on medical reasoning tasks

## Quickstart

### Basic Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "TachyHealth/Gazal-R1-32B-GRPO-preview"

# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# Medical reasoning prompt
prompt = """A 65-year-old male presents with chest pain, shortness of breath, and elevated troponin levels. 
ECG shows ST-segment elevation in leads II, III, and aVF. What is the most likely diagnosis and immediate management?"""

messages = [
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# Generate response with medical reasoning
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=2048,
    temperature=0.7,
    top_p=0.8,
    top_k=20
)

output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
response = tokenizer.decode(output_ids, skip_special_tokens=True)

print("Medical Assessment:", response)
```

### Structured Medical Reasoning Format

Gazal-R1 is trained to provide structured medical reasoning in the following format:

```
<think>
Step 1: Analyze presenting symptoms - chest pain, dyspnea, troponin elevation
Step 2: Interpret ECG findings - ST elevation in inferior leads
Step 3: Consider differential diagnoses - STEMI vs NSTEMI vs unstable angina
Step 4: Identify culprit vessel - likely RCA given inferior lead changes
Step 5: Assess urgency - emergent intervention required
Step 6: Plan immediate management - dual antiplatelet, anticoagulation, cath lab
</think>

## Clinical Assessment

**Primary Diagnosis**: ST-Elevation Myocardial Infarction (STEMI), inferior wall

**Reasoning**: The combination of chest pain, elevated troponin, and ST-elevation in leads II, III, and aVF is pathognomonic for inferior STEMI, likely involving the right coronary artery (RCA).

**Immediate Management**:
1. **Reperfusion therapy**: Emergency cardiac catheterization with primary PCI
2. **Antiplatelet therapy**: Aspirin 325mg + P2Y12 inhibitor (clopidogrel/ticagrelor)
3. **Anticoagulation**: Heparin or bivalirudin
4. **Supportive care**: O2 if hypoxic, nitroglycerin for pain (avoid if hypotensive)

**Follow-up**: Post-PCI monitoring, echocardiogram, cardiac rehabilitation referral
```

## Training Methodology

### Stage 1: Supervised Fine-Tuning (SFT)
- **Dataset**: 107,033 synthetic medical reasoning examples + [MedReason dataset](https://huggingface.co/datasets/UCSC-VLAA/MedReason)
- **Techniques**: DoRA + rsLoRA with rank 256
- **Focus**: Structured clinical reasoning across diagnostic, therapeutic, and prognostic scenarios

### Stage 2: Group Relative Policy Optimization (GRPO)
- **Algorithm**: Value-function-free reinforcement learning
- **Dataset**: UltraMedical subset (32K medical MCQs)
- **Rewards**: Multi-component system (accuracy, format, length control, repetition penalty)
- **Improvements**: Enhanced reasoning quality and format adherence

## Model Capabilities

### Clinical Reasoning Types
1. **Diagnostic Reasoning**: Systematic symptom analysis → differential diagnosis
2. **Treatment Planning**: Evidence-based therapy selection with patient-specific factors
3. **Decision-Making Under Uncertainty**: Risk assessment and clinical judgment
4. **Prognostic Assessment**: Outcome prediction based on clinical evidence

### Medical Specialties Covered
- Internal Medicine
- Emergency Medicine  
- Cardiology
- Pulmonology
- Infectious Disease
- Pharmacology
- Pathophysiology
- Clinical Laboratory Medicine

## Limitations and Important Disclaimers

### ⚠️ Critical Safety Information
- **NOT A MEDICAL DEVICE**: Gazal-R1 is a research model and is **NOT** intended for direct clinical use, diagnosis, or treatment planning
- **REQUIRES PROFESSIONAL VERIFICATION**: All outputs must be independently verified by qualified medical professionals
- **NO REAL-TIME UPDATES**: Knowledge is static and does not reflect the latest medical research or guidelines

### Technical Limitations
- **Knowledge Cutoff**: Training data reflects medical knowledge up to the training date
- **Hallucination Risk**: May generate plausible-sounding but factually incorrect information
- **Evaluation Scope**: Primarily evaluated on multiple-choice questions; real-world clinical scenarios may differ
- **Regional Bias**: Training data may contain geographical or demographic biases

### Ethical Considerations
- **Professional Responsibility**: Final medical decisions must always rest with qualified healthcare providers
- **Accountability**: Users assume responsibility for verifying and appropriately applying model outputs
- **Patient Safety**: Never use for emergency medical situations or time-critical decisions

## Use Cases

### Research and Education
- Medical education and training
- Clinical reasoning research
- Medical knowledge assessment
- Academic medical writing assistance

### Professional Support (With Supervision)
- Literature review assistance
- Clinical case analysis support
- Medical documentation aid
- Differential diagnosis exploration

### NOT Suitable For
- Direct patient care
- Emergency medical decisions
- Replacing clinical judgment
- Unsupervised medical advice

## Citation

If you find Gazal-R1 helpful in your research, please cite our work:

```bibtex
@article{gazal-r1-2025,
    title={Gazal-R1: Achieving State-of-the-Art Medical Reasoning with Parameter-Efficient Two-Stage Training},
    author={Ahmed M. Adly and Mostafa Samy and Amr Fawzy},
    journal={arXiv preprint arXiv:2506.21594},
    year={2025},
    url={https://arxiv.org/abs/2506.21594}
}
```

## Model Access

- **Model Weights**: Available on Hugging Face Hub
- **Datasets**: Training datasets available at [TachyHealth/structured_medical](https://huggingface.co/datasets/TachyHealth/structured_medical) and [TachyHealth/medical_grpo](https://huggingface.co/datasets/TachyHealth/medical_grpo)
<!-- - **Technical Report**: [arXiv:2505.09388](https://arxiv.org/abs/2505.09388) -->

## License

This model is released under the Apache 2.0 License. Please review the license terms before use.

## Contact

For questions about Gazal-R1, please contact:
- **Research Team**: TachyHealth
- **Website**: [https://tachyhealth.com/](https://tachyhealth.com/)
- **Gazal Platform**: [Gazal.ai](https://gazal.ai)

---

*Developed by TachyHealth Research Team. This model represents a significant advancement in medical AI reasoning while emphasizing the critical importance of professional medical oversight.*