Text Classification
Safetensors
English
Chinese
medical
jymcc commited on
Commit
c285f67
·
verified ·
1 Parent(s): 02424be

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +69 -3
README.md CHANGED
@@ -1,3 +1,69 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - zh
6
+ base_model:
7
+ - meta-llama/Llama-3.2-3B-Instruct
8
+ pipeline_tag: text-classification
9
+ tags:
10
+ - medical
11
+ datasets:
12
+ - FreedomIntelligence/medical-o1-verifiable-problem
13
+ ---
14
+ # <span>Introduction</span>
15
+
16
+ This is a **medical verifier** designed to evaluate the correctness of LLM outputs on [medical verifiable problems](https://huggingface.co/datasets/FreedomIntelligence/medical-o1-verifiable-problem). Such verification can be utilized to enhance the medical reasoning capabilities of LLMs.
17
+
18
+ For details, please refer to our [paper](https://arxiv.org/pdf/2412.18925) and [GitHub repository](https://github.com/FreedomIntelligence/HuatuoGPT-o1).
19
+ Additionally, you can explore [HuatuoGPT-o1](https://huggingface.co/FreedomIntelligence/HuatuoGPT-o1-8B), our advanced medical LLM specializing in complex medical reasoning.
20
+
21
+
22
+ # <span>Usage</span>
23
+ Follow the code below to utilize this model:
24
+ ```python
25
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
26
+ import torch.nn.functional as F
27
+
28
+ # Load tokenizer and model
29
+ model_path = 'FreedomIntelligence/medical_o1_verifier_3B'
30
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
31
+ model = AutoModelForSequenceClassification.from_pretrained(
32
+ model_path, torch_dtype="auto", device_map="auto", attn_implementation="flash_attention_2", num_labels=2
33
+ )
34
+
35
+ # Evaluation template
36
+ template = """<Model Response>
37
+ {}
38
+ </Model Response>
39
+
40
+ <Reference Answer>
41
+ {}
42
+ </Reference Answer>
43
+
44
+ Your task is to evaluate the model response by comparing it to the reference answer. If the model response is correct and aligns with the reference answer, output "True" . If it is incorrect or fails to select the correct option (if options are provided), output "False" . {}"""
45
+
46
+ # Tokenize input and evaluate
47
+ LLM_response = 'The answer is 25 percentage'
48
+ ground_truth_answer = '25%'
49
+ input_batch = tokenizer([template.format(LLM_response,ground_truth_answer,tokenizer.eos_token)], return_tensors="pt").to(model.device)
50
+ logits = model(**input_batch,return_dict=True).logits
51
+ probabilities = F.softmax(logits, dim=-1)
52
+ result = "True" if probabilities[0, 1] > 0.5 else "False"
53
+
54
+ print(f"Evaluation Result: {result}")
55
+ ```
56
+
57
+
58
+ # <span>📖 Citation</span>
59
+ ```
60
+ @misc{chen2024huatuogpto1medicalcomplexreasoning,
61
+ title={HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs},
62
+ author={Junying Chen and Zhenyang Cai and Ke Ji and Xidong Wang and Wanlong Liu and Rongsheng Wang and Jianye Hou and Benyou Wang},
63
+ year={2024},
64
+ eprint={2412.18925},
65
+ archivePrefix={arXiv},
66
+ primaryClass={cs.CL},
67
+ url={https://arxiv.org/abs/2412.18925},
68
+ }
69
+ ```