AhmedMostafa's picture
Add Hugging Face paper link for improved discoverability (#1)
4d550d7 verified
metadata
base_model:
  - TachyHealth/Gazal-R1-32B-sft-merged-preview
datasets:
  - TachyHealth/medical_grpo
  - TachyHealth/structured_medical
library_name: transformers
license: apache-2.0
license_link: https://huggingface.co/TachyHealth/Gazal-R1-32B-GRPO-preview/blob/main/LICENSE
pipeline_tag: text-generation
tags:
  - gazal-r1
  - grpo
  - qwen3
  - conversational
  - medical
  - clinical
  - healthcare
  - reasoning

Gazal-R1-32B: Medical Reasoning Language Model

The model was presented in the paper Gazal-R1: Achieving State-of-the-Art Medical Reasoning with Parameter-Efficient Two-Stage Training.

Gazal AI

Model Highlights

Gazal-R1 is a state-of-the-art 32-billion-parameter language model specifically designed for medical reasoning and clinical decision-making. Built upon Qwen 3 32B, Gazal-R1 demonstrates that strategic training can enable mid-sized models to outperform significantly larger counterparts in specialized medical domains.

Key features include:

  • πŸ”¬ Medical Expertise: Specialized training on 107,033 synthetic medical reasoning examples covering diagnostic reasoning, treatment planning, decision-making under uncertainty, and prognostic assessment
  • 🧠 Transparent Reasoning: Structured clinical thinking with step-by-step explanations in <think></think> tags, following established clinical reasoning frameworks
  • πŸ“Š State-of-the-Art Performance: Achieves 87.1% on MedQA, 81.6% on MMLU Pro (Medical), and 79.6% on PubMedQA, surpassing models up to 12Γ— larger
  • ⚑ Parameter Efficiency: Advanced training techniques including Weight-Decomposed Low-Rank Adaptation (DoRA) and Rank-Stabilized LoRA (rsLoRA)
  • 🎯 Alignment Optimization: Refined through Group Relative Policy Optimization (GRPO) with sophisticated multi-component reward systems
  • 🌍 Medical Knowledge: Comprehensive understanding across multiple medical specialties and clinical scenarios

Model Overview

Gazal-R1-32B has the following characteristics:

  • Type: Causal Language Model (Medical Reasoning Specialist)
  • Base Model: Qwen 3 32B
  • Training Stages: Two-stage pipeline (Supervised Fine-Tuning + Reinforcement Learning)
  • Number of Parameters: 32.8B
  • Number of Parameters (Non-Embedding): 31.2B
  • Context Length: 32,768 tokens natively, extensible to 131,072 with YaRN
  • Training Data: 107,033 synthetic medical reasoning examples + MedReason dataset (32,682 examples)
  • Fine-tuning Method: DoRA + rsLoRA (Parameter-Efficient Fine-Tuning)
  • Alignment: Group Relative Policy Optimization (GRPO)

For detailed methodology, training insights, and comprehensive evaluation, please refer to our technical report.

Performance Results

Gazal-R1 achieves exceptional performance across standard medical benchmarks:

Model Size MMLU Pro (Medical) MedMCQA MedQA PubMedQA
Gazal-R1 (Final) 32B 81.6 71.9 87.1 79.6
Gazal-R1 (SFT-only) 32B 79.3 72.3 86.9 77.6
Llama 3.1 405B Instruct 405B 70.2 75.8 81.9 74.6
Qwen 2.5 72B Instruct 72B 72.1 66.2 72.7 71.7
Med42-Llama3.1-70B 70B 66.1 72.4 80.4 77.6
Llama 3.1 70B Instruct 70B 74.5 72.5 78.4 78.5
QwQ 32B 32B 70.1 65.6 72.3 73.7
Qwen 3 32B 32B 78.4 71.6 84.4 76.7

Key Achievements:

  • πŸ₯‡ Highest scores on MMLU Pro (Medical), MedQA, and PubMedQA
  • πŸ“ˆ Significant improvements from GRPO training (+2.3% on MMLU Pro, +2.0% on PubMedQA)
  • πŸš€ Outperforms models up to 12Γ— larger (Llama 3.1 405B) on medical reasoning tasks

Quickstart

Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "TachyHealth/Gazal-R1-32B-GRPO-preview"

# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# Medical reasoning prompt
prompt = """A 65-year-old male presents with chest pain, shortness of breath, and elevated troponin levels. 
ECG shows ST-segment elevation in leads II, III, and aVF. What is the most likely diagnosis and immediate management?"""

messages = [
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# Generate response with medical reasoning
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=2048,
    temperature=0.7,
    top_p=0.8,
    top_k=20
)

output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
response = tokenizer.decode(output_ids, skip_special_tokens=True)

print("Medical Assessment:", response)

Structured Medical Reasoning Format

Gazal-R1 is trained to provide structured medical reasoning in the following format:

<think>
Step 1: Analyze presenting symptoms - chest pain, dyspnea, troponin elevation
Step 2: Interpret ECG findings - ST elevation in inferior leads
Step 3: Consider differential diagnoses - STEMI vs NSTEMI vs unstable angina
Step 4: Identify culprit vessel - likely RCA given inferior lead changes
Step 5: Assess urgency - emergent intervention required
Step 6: Plan immediate management - dual antiplatelet, anticoagulation, cath lab
</think>

## Clinical Assessment

**Primary Diagnosis**: ST-Elevation Myocardial Infarction (STEMI), inferior wall

**Reasoning**: The combination of chest pain, elevated troponin, and ST-elevation in leads II, III, and aVF is pathognomonic for inferior STEMI, likely involving the right coronary artery (RCA).

**Immediate Management**:
1. **Reperfusion therapy**: Emergency cardiac catheterization with primary PCI
2. **Antiplatelet therapy**: Aspirin 325mg + P2Y12 inhibitor (clopidogrel/ticagrelor)
3. **Anticoagulation**: Heparin or bivalirudin
4. **Supportive care**: O2 if hypoxic, nitroglycerin for pain (avoid if hypotensive)

**Follow-up**: Post-PCI monitoring, echocardiogram, cardiac rehabilitation referral

Training Methodology

Stage 1: Supervised Fine-Tuning (SFT)

  • Dataset: 107,033 synthetic medical reasoning examples + MedReason dataset
  • Techniques: DoRA + rsLoRA with rank 256
  • Focus: Structured clinical reasoning across diagnostic, therapeutic, and prognostic scenarios

Stage 2: Group Relative Policy Optimization (GRPO)

  • Algorithm: Value-function-free reinforcement learning
  • Dataset: UltraMedical subset (32K medical MCQs)
  • Rewards: Multi-component system (accuracy, format, length control, repetition penalty)
  • Improvements: Enhanced reasoning quality and format adherence

Model Capabilities

Clinical Reasoning Types

  1. Diagnostic Reasoning: Systematic symptom analysis β†’ differential diagnosis
  2. Treatment Planning: Evidence-based therapy selection with patient-specific factors
  3. Decision-Making Under Uncertainty: Risk assessment and clinical judgment
  4. Prognostic Assessment: Outcome prediction based on clinical evidence

Medical Specialties Covered

  • Internal Medicine
  • Emergency Medicine
  • Cardiology
  • Pulmonology
  • Infectious Disease
  • Pharmacology
  • Pathophysiology
  • Clinical Laboratory Medicine

Limitations and Important Disclaimers

⚠️ Critical Safety Information

  • NOT A MEDICAL DEVICE: Gazal-R1 is a research model and is NOT intended for direct clinical use, diagnosis, or treatment planning
  • REQUIRES PROFESSIONAL VERIFICATION: All outputs must be independently verified by qualified medical professionals
  • NO REAL-TIME UPDATES: Knowledge is static and does not reflect the latest medical research or guidelines

Technical Limitations

  • Knowledge Cutoff: Training data reflects medical knowledge up to the training date
  • Hallucination Risk: May generate plausible-sounding but factually incorrect information
  • Evaluation Scope: Primarily evaluated on multiple-choice questions; real-world clinical scenarios may differ
  • Regional Bias: Training data may contain geographical or demographic biases

Ethical Considerations

  • Professional Responsibility: Final medical decisions must always rest with qualified healthcare providers
  • Accountability: Users assume responsibility for verifying and appropriately applying model outputs
  • Patient Safety: Never use for emergency medical situations or time-critical decisions

Use Cases

Research and Education

  • Medical education and training
  • Clinical reasoning research
  • Medical knowledge assessment
  • Academic medical writing assistance

Professional Support (With Supervision)

  • Literature review assistance
  • Clinical case analysis support
  • Medical documentation aid
  • Differential diagnosis exploration

NOT Suitable For

  • Direct patient care
  • Emergency medical decisions
  • Replacing clinical judgment
  • Unsupervised medical advice

Citation

If you find Gazal-R1 helpful in your research, please cite our work:

@article{gazal-r1-2025,
    title={Gazal-R1: Achieving State-of-the-Art Medical Reasoning with Parameter-Efficient Two-Stage Training},
    author={Ahmed M. Adly and Mostafa Samy and Amr Fawzy},
    journal={arXiv preprint arXiv:2506.21594},
    year={2025},
    url={https://arxiv.org/abs/2506.21594}
}

Model Access

License

This model is released under the Apache 2.0 License. Please review the license terms before use.

Contact

For questions about Gazal-R1, please contact:


Developed by TachyHealth Research Team. This model represents a significant advancement in medical AI reasoning while emphasizing the critical importance of professional medical oversight.