Medical-Diagnosis-COT-Gemma3-270M / README.md

Update README.md

62d6730 verified 3 days ago

9.69 kB

	---
	base_model: google/gemma-3-270m-it
	tags:
	- medical
	- healthcare
	- chain-of-thought
	- cot
	- reasoning
	- sft
	- lora
	- unsloth
	- gemma
	- gemma-3
	- long-context
	- transformers
	license: apache-2.0
	language:
	- en
	pipeline_tag: text-generation
	datasets:
	- FreedomIntelligence/medical-o1-reasoning-SFT
	---

	<div align="center">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/669777597cb32718c20d97e9/4emWK_PB-RrifIbrCUjE8.png"
	alt="Title card"
	style="width: 500px; height: auto; object-position: center top;">
	</div>

	# Medical-Diagnosis-COT-Gemma3-270M

	Alpha AI (www.alphaai.biz) fine-tuned Gemma-3 270M for medical question answering with explicit chain-of-thought (CoT). The model emits reasoning inside `<think> ... </think>` followed by a final answer, making it well-suited for research on verifiable medical reasoning and for internal tooling where transparent intermediate steps are desired.

	> ⚠️Not for clinical use. This model is a research system, not a medical device. Do not use it for diagnosis or treatment decisions.

	---

	## TL;DR

	* Base: `google/gemma-3-270m-it` (Gemma 3, 270M params). Access requires accepting Google’s Gemma license on Hugging Face.
	* Data: `FreedomIntelligence/medical-o1-reasoning-SFT` (`en` split; columns: `Question`, `Complex_CoT`, `Response`).
	* Training: SFT with LoRA via Unsloth; assistant-only loss; sequences templated in Gemma-3 chat format.
	* Objective: Produce `<think>…</think>` (reasoning) + final answer.

	---

	## Intended Use & Limitations

	Intended use

	* Research on medical reasoning, CoT interpretability, prompt engineering, dataset curation.
	* Internal assistants that require visible reasoning traces (to be reviewed by humans).

	Out of scope / limitations

	* Not a substitute for clinician judgment; may hallucinate facts.
	* No guarantee of guideline adherence (e.g., UpToDate/NICE/ACOG).
	* Biases from synthetic/derived training data will propagate.

	Dataset license is Apache-2.0; the base model is covered by Google's Gemma license, review these before use.

	---

	## Model Details

	* Model family: Gemma 3 (270M).
	* Context window: Gemma 3 supports up to 128K tokens (practical context used during SFT was lower due to GPU limits).
	* Architecture: decoder-only causal LM (Gemma family).
	* Fine-tuning: Parameter-Efficient Fine-Tuning (LoRA) using Unsloth.

	---

	## Data

	Source: `FreedomIntelligence/medical-o1-reasoning-SFT` + Human annotated and filtered medical diagnosis data

	* English subset (`en`): \~19.7k rows with fields `Question`, `Complex_CoT`, `Response`. Total dataset ≈ 90,120 rows across splits/languages.
	* Built for advanced medical reasoning; constructed with GPT-4o and a verifier on verifiable medical problems. See dataset card and paper.
	* Human annotated data ~ 10k rows with similar fields.

	Preprocessing

	* Each row rendered to a chat conversation under the Gemma-3 template.
	* Assistant output concatenates `<think>{Complex_CoT}</think>\n{Response}`.
	* Training objective masked to assistant tokens only (instruction masking).

	---

	## Prompt & Output Format

	Training prompt style (system + user + assistant):

	```text
	System:
	Below is an instruction that describes a task,
	paired with an input that provides further context.

	Write a response that appropriately completes the request.

	Before answering, think carefully about the question and create a step-by-step
	chain of thoughts to ensure a logical and accurate response.

	### Instruction:

	You are a medical expert with advanced knowledge in clinical reasoning,
	diagnostics, and treatment planning.

	Please answer the following medical question.

	User:
	### Question:

	{question}

	### Response:

	Assistant:
	<think>
	{complex_cot}
	</think>
	{final_answer}
	```

	Generation tip (serve-time):
	If you don’t want to expose CoT to end-users, post-process generated text to strip the `<think> ... </think>` block and only show the final answer.

	Python snippet to strip CoT:

	```python
	import re
	def strip_think(text: str) -> str:
	return re.sub(r"<think>.?</think>\s", "", text, flags=re.S\|re.I).strip()
	```

	---

	## Quick Start

	### Transformers (merged weights)

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	repo = "alphaaico/Medical-Diagnosis-COT-Gemma3-270M"
	tok = AutoTokenizer.from_pretrained(repo)
	mdl = AutoModelForCausalLM.from_pretrained(repo, device_map="auto")

	prompt = """### Question:

	A 65-year-old with exertional dyspnea and orthopnea presents with bilateral
	pitting edema and raised JVP. What initial pharmacologic therapy is indicated?

	### Response:
	"""

	from transformers import TextStreamer
	streamer = TextStreamer(tok, skip_prompt=True, skip_special_tokens=True)

	inputs = tok.apply_chat_template(
	[{"role":"system","content":"You are a careful medical reasoner."},
	{"role":"user","content":prompt}],
	tokenize=True, add_generation_prompt=True, return_tensors="pt"
	).to(mdl.device)

	out = mdl.generate(**inputs, max_new_tokens=256, temperature=1.0, top_p=0.95, top_k=64, streamer=streamer)
	```

	### PEFT LoRA (if this repo hosts LoRA adapters)

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	from peft import PeftModel

	base = "google/gemma-3-270m-it" # requires accepting Gemma license on Hugging Face
	repo = "alphaaico/Medical-Diagnosis-COT-Gemma3-270M"

	tok = AutoTokenizer.from_pretrained(base)
	base_mdl = AutoModelForCausalLM.from_pretrained(base, device_map="auto")
	mdl = PeftModel.from_pretrained(base_mdl, repo) # merges LoRA at runtime
	mdl.eval()
	```

	> If you see a 403/404 on the base model, make sure you’ve accepted Google’s Gemma license in your Hugging Face account.

	---

	## Training Procedure

	* Trainer: TRL `SFTTrainer` (supervised finetuning).
	* Library: Unsloth for fast loading + LoRA. Example reference repo for Gemma-3-270M is available on Hugging Face.

	Key hyperparameters (demo run that produced loss ≈ 3.3):

	* Steps: `max_steps=500` , ideall go beyond 1500 for better results.
	* Optimizer: `adamw_8bit`, `weight_decay=0.01`, `lr_scheduler_type=linear`
	* LR: `5e-5` (for longer runs, `2e-5` is often more stable)
	* Batch: `per_device_train_batch_size=8`, `gradient_accumulation_steps=1`
	* Warmup: `warmup_steps=5`
	* Seed: `3407`
	* Loss masking: assistant-only (user/system tokens ignored)

	> Note: For very long CoT sequences on small GPUs (e.g., T4 16 GB), consider `per_device_train_batch_size=1` + gradient accumulation and a larger `max_seq_length`. Gemma 3 supports up to 128K context, but practical fine-tuning length depends on memory. For smaller queries, everything fits well for this to be used on edge devices.

	---

	## Evaluation

	* Training loss: \~3.3 at step 500 (demo run). Further decreased to \~2.3 with parameter tuning.
	* Format compliance: qualitatively verified to produce `<think>…</think>` followed by an answer.
	* No formal clinical benchmarks reported yet. Contributions welcome via the Discussions tab.

	---

	## Safety, Ethics, and Responsible Use

	* Do not rely on outputs for patient care. Validate with authoritative sources and licensed professionals.
	* Be mindful of dataset artifacts and synthetic reasoning patterns.
	* Consider stripping CoT in user-facing apps to avoid over-trust in intermediate narratives.

	---

	## License & Attribution

	* Base model: Google Gemma 3 — access controlled under Google’s Gemma license on Hugging Face.
	* Dataset: `FreedomIntelligence/medical-o1-reasoning-SFT` — Apache-2.0.
	* This fine-tune: Derivative work under the base model’s license terms. Review and comply with both licenses before distribution or commercial use.

	Please cite the dataset if you use this model:

	```
	@misc{chen2024huatuogpto1medicalcomplexreasoning,
	title={HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs},
	author={Junying Chen and Zhenyang Cai and Ke Ji and Xidong Wang and Wanlong Liu and Rongsheng Wang and Jianye Hou and Benyou Wang},
	year={2024},
	eprint={2412.18925},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2412.18925}
	}
	```

	---

	## Acknowledgements

	* Google for releasing Gemma 3.
	* FreedomIntelligence for `medical-o1-reasoning-SFT`.
	* Unsloth for streamlined fine-tuning utilities for Gemma 3.

	---

	## Contact

	Questions, issues, or contributions: open a Discussion on this repo.
	For enterprise collaboration with Alpha AI, reach out via the organization profile on Hugging Face or find us on www.alphaai.biz.

	---

	### Appendix: Inference without CoT (server-side filter)

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import re

	repo = "alphaaico/Medical-Diagnosis-COT-Gemma3-270M"
	tok = AutoTokenizer.from_pretrained(repo)
	mdl = AutoModelForCausalLM.from_pretrained(repo, device_map="auto")

	def strip_think(txt): return re.sub(r"<think>.?</think>\s", "", txt, flags=re.S\|re.I).strip()

	def ask(question):
	user = f"### Question:\n\n{question}\n\n### Response:"
	msgs = [
	{"role":"system","content":"You are a medical expert. Think step-by-step, then answer succinctly."},
	{"role":"user","content":user},
	]
	prompt = tok.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True)
	out = mdl.generate(**tok(prompt, return_tensors="pt").to(mdl.device), max_new_tokens=512, temperature=1.0, top_p=0.95, top_k=64)
	text = tok.decode(out[0], skip_special_tokens=True)
	return strip_think(text)

	print(ask("Elderly patient with new ankle swelling on thiazolidinedione—likely cause?"))
	```