--- base_model: - TachyHealth/Gazal-R1-32B-sft-merged-preview datasets: - TachyHealth/medical_grpo - TachyHealth/structured_medical library_name: transformers license: apache-2.0 license_link: https://huggingface.co/TachyHealth/Gazal-R1-32B-GRPO-preview/blob/main/LICENSE pipeline_tag: text-generation tags: - gazal-r1 - grpo - qwen3 - conversational - medical - clinical - healthcare - reasoning --- # Gazal-R1-32B: Medical Reasoning Language Model The model was presented in the paper [Gazal-R1: Achieving State-of-the-Art Medical Reasoning with Parameter-Efficient Two-Stage Training](https://huggingface.co/papers/2506.21594). Gazal AI ## Model Highlights Gazal-R1 is a state-of-the-art 32-billion-parameter language model specifically designed for medical reasoning and clinical decision-making. Built upon Qwen 3 32B, Gazal-R1 demonstrates that strategic training can enable mid-sized models to outperform significantly larger counterparts in specialized medical domains. Key features include: - **🔬 Medical Expertise**: Specialized training on 107,033 synthetic medical reasoning examples covering diagnostic reasoning, treatment planning, decision-making under uncertainty, and prognostic assessment - **🧠 Transparent Reasoning**: Structured clinical thinking with step-by-step explanations in `` tags, following established clinical reasoning frameworks - **📊 State-of-the-Art Performance**: Achieves 87.1% on MedQA, 81.6% on MMLU Pro (Medical), and 79.6% on PubMedQA, surpassing models up to 12× larger - **⚡ Parameter Efficiency**: Advanced training techniques including Weight-Decomposed Low-Rank Adaptation (DoRA) and Rank-Stabilized LoRA (rsLoRA) - **🎯 Alignment Optimization**: Refined through Group Relative Policy Optimization (GRPO) with sophisticated multi-component reward systems - **🌍 Medical Knowledge**: Comprehensive understanding across multiple medical specialties and clinical scenarios ## Model Overview **Gazal-R1-32B** has the following characteristics: - **Type**: Causal Language Model (Medical Reasoning Specialist) - **Base Model**: Qwen 3 32B - **Training Stages**: Two-stage pipeline (Supervised Fine-Tuning + Reinforcement Learning) - **Number of Parameters**: 32.8B - **Number of Parameters (Non-Embedding)**: 31.2B - **Context Length**: 32,768 tokens natively, extensible to 131,072 with YaRN - **Training Data**: 107,033 synthetic medical reasoning examples + [MedReason dataset](https://huggingface.co/datasets/UCSC-VLAA/MedReason) (32,682 examples) - **Fine-tuning Method**: DoRA + rsLoRA (Parameter-Efficient Fine-Tuning) - **Alignment**: Group Relative Policy Optimization (GRPO) For detailed methodology, training insights, and comprehensive evaluation, please refer to our [technical report](https://arxiv.org/abs/2506.21594). ## Performance Results Gazal-R1 achieves exceptional performance across standard medical benchmarks: | Model | Size | MMLU Pro (Medical) | MedMCQA | MedQA | PubMedQA | |-------|------|-------------------|---------|-------|----------| | **Gazal-R1 (Final)** | **32B** | **81.6** | **71.9** | **87.1** | **79.6** | | [Gazal-R1 (SFT-only)](https://huggingface.co/TachyHealth/Gazal-R1-32B-sft-merged-preview) | 32B | 79.3 | 72.3 | 86.9 | 77.6 | | Llama 3.1 405B Instruct | 405B | 70.2 | 75.8 | 81.9 | 74.6 | | Qwen 2.5 72B Instruct | 72B | 72.1 | 66.2 | 72.7 | 71.7 | | Med42-Llama3.1-70B | 70B | 66.1 | 72.4 | 80.4 | 77.6 | | Llama 3.1 70B Instruct | 70B | 74.5 | 72.5 | 78.4 | 78.5 | | QwQ 32B | 32B | 70.1 | 65.6 | 72.3 | 73.7 | | Qwen 3 32B | 32B | 78.4 | 71.6 | 84.4 | 76.7 | **Key Achievements:** - 🥇 Highest scores on MMLU Pro (Medical), MedQA, and PubMedQA - 📈 Significant improvements from GRPO training (+2.3% on MMLU Pro, +2.0% on PubMedQA) - 🚀 Outperforms models up to 12× larger (Llama 3.1 405B) on medical reasoning tasks ## Quickstart ### Basic Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "TachyHealth/Gazal-R1-32B-GRPO-preview" # Load the tokenizer and model tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) # Medical reasoning prompt prompt = """A 65-year-old male presents with chest pain, shortness of breath, and elevated troponin levels. ECG shows ST-segment elevation in leads II, III, and aVF. What is the most likely diagnosis and immediate management?""" messages = [ {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) # Generate response with medical reasoning generated_ids = model.generate( **model_inputs, max_new_tokens=2048, temperature=0.7, top_p=0.8, top_k=20 ) output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() response = tokenizer.decode(output_ids, skip_special_tokens=True) print("Medical Assessment:", response) ``` ### Structured Medical Reasoning Format Gazal-R1 is trained to provide structured medical reasoning in the following format: ``` Step 1: Analyze presenting symptoms - chest pain, dyspnea, troponin elevation Step 2: Interpret ECG findings - ST elevation in inferior leads Step 3: Consider differential diagnoses - STEMI vs NSTEMI vs unstable angina Step 4: Identify culprit vessel - likely RCA given inferior lead changes Step 5: Assess urgency - emergent intervention required Step 6: Plan immediate management - dual antiplatelet, anticoagulation, cath lab ## Clinical Assessment **Primary Diagnosis**: ST-Elevation Myocardial Infarction (STEMI), inferior wall **Reasoning**: The combination of chest pain, elevated troponin, and ST-elevation in leads II, III, and aVF is pathognomonic for inferior STEMI, likely involving the right coronary artery (RCA). **Immediate Management**: 1. **Reperfusion therapy**: Emergency cardiac catheterization with primary PCI 2. **Antiplatelet therapy**: Aspirin 325mg + P2Y12 inhibitor (clopidogrel/ticagrelor) 3. **Anticoagulation**: Heparin or bivalirudin 4. **Supportive care**: O2 if hypoxic, nitroglycerin for pain (avoid if hypotensive) **Follow-up**: Post-PCI monitoring, echocardiogram, cardiac rehabilitation referral ``` ## Training Methodology ### Stage 1: Supervised Fine-Tuning (SFT) - **Dataset**: 107,033 synthetic medical reasoning examples + [MedReason dataset](https://huggingface.co/datasets/UCSC-VLAA/MedReason) - **Techniques**: DoRA + rsLoRA with rank 256 - **Focus**: Structured clinical reasoning across diagnostic, therapeutic, and prognostic scenarios ### Stage 2: Group Relative Policy Optimization (GRPO) - **Algorithm**: Value-function-free reinforcement learning - **Dataset**: UltraMedical subset (32K medical MCQs) - **Rewards**: Multi-component system (accuracy, format, length control, repetition penalty) - **Improvements**: Enhanced reasoning quality and format adherence ## Model Capabilities ### Clinical Reasoning Types 1. **Diagnostic Reasoning**: Systematic symptom analysis → differential diagnosis 2. **Treatment Planning**: Evidence-based therapy selection with patient-specific factors 3. **Decision-Making Under Uncertainty**: Risk assessment and clinical judgment 4. **Prognostic Assessment**: Outcome prediction based on clinical evidence ### Medical Specialties Covered - Internal Medicine - Emergency Medicine - Cardiology - Pulmonology - Infectious Disease - Pharmacology - Pathophysiology - Clinical Laboratory Medicine ## Limitations and Important Disclaimers ### ⚠️ Critical Safety Information - **NOT A MEDICAL DEVICE**: Gazal-R1 is a research model and is **NOT** intended for direct clinical use, diagnosis, or treatment planning - **REQUIRES PROFESSIONAL VERIFICATION**: All outputs must be independently verified by qualified medical professionals - **NO REAL-TIME UPDATES**: Knowledge is static and does not reflect the latest medical research or guidelines ### Technical Limitations - **Knowledge Cutoff**: Training data reflects medical knowledge up to the training date - **Hallucination Risk**: May generate plausible-sounding but factually incorrect information - **Evaluation Scope**: Primarily evaluated on multiple-choice questions; real-world clinical scenarios may differ - **Regional Bias**: Training data may contain geographical or demographic biases ### Ethical Considerations - **Professional Responsibility**: Final medical decisions must always rest with qualified healthcare providers - **Accountability**: Users assume responsibility for verifying and appropriately applying model outputs - **Patient Safety**: Never use for emergency medical situations or time-critical decisions ## Use Cases ### Research and Education - Medical education and training - Clinical reasoning research - Medical knowledge assessment - Academic medical writing assistance ### Professional Support (With Supervision) - Literature review assistance - Clinical case analysis support - Medical documentation aid - Differential diagnosis exploration ### NOT Suitable For - Direct patient care - Emergency medical decisions - Replacing clinical judgment - Unsupervised medical advice ## Citation If you find Gazal-R1 helpful in your research, please cite our work: ```bibtex @article{gazal-r1-2025, title={Gazal-R1: Achieving State-of-the-Art Medical Reasoning with Parameter-Efficient Two-Stage Training}, author={Ahmed M. Adly and Mostafa Samy and Amr Fawzy}, journal={arXiv preprint arXiv:2506.21594}, year={2025}, url={https://arxiv.org/abs/2506.21594} } ``` ## Model Access - **Model Weights**: Available on Hugging Face Hub - **Datasets**: Training datasets available at [TachyHealth/structured_medical](https://huggingface.co/datasets/TachyHealth/structured_medical) and [TachyHealth/medical_grpo](https://huggingface.co/datasets/TachyHealth/medical_grpo) ## License This model is released under the Apache 2.0 License. Please review the license terms before use. ## Contact For questions about Gazal-R1, please contact: - **Research Team**: TachyHealth - **Website**: [https://tachyhealth.com/](https://tachyhealth.com/) - **Gazal Platform**: [Gazal.ai](https://gazal.ai) --- *Developed by TachyHealth Research Team. This model represents a significant advancement in medical AI reasoning while emphasizing the critical importance of professional medical oversight.*