license: unknown
LLaVA-Med model for multimodal radiology report generation
This is a model based on LLaVA-Med 1.0, finetuned to generate medical reports, based on a chest X-ray and a prompt, in our case, the instruction was "write the finding section of chest x-ray radiology report".
The dataset used for finetuning was the MIMIC-CXR share for the challenge in Radiology Report Generation for the Association for Computational Linguistics 2024 at BioNLP Workshop
We used the 148,374 findings of MIMIC-CXR for finetuning during 3 epochs.
The model metrics on the 1,063 samples of the hidden test set of the ACL challenge are the following:
Method | BLEU-4 | Rouge-L | Bertscore | F1-CheXbert | F1-RadGraph | Avg |
---|---|---|---|---|---|---|
llavamed1.0 | 5.05 | 19.13 | 47.51 | 23.06 | 15.77 | 22.10 |
The metrics were calculated direcly by the challenge organizer, however you can reproduce them with the following example code:
import json
import logging
from vilmedic.blocks.scorers.scores import compute_scores
refs = [
"The lungs are clear. The cardiomediastinal silhouette is within normal limits. No acute osseous abnormalities.",
"The lungs are clear.There is no pleural effusion or pneumothorax.The cardiomediastinal silhouette is normal."
]
hyps = [
"The lungs are clear. There is no pleural effusion or pneumothorax. The cardiomediastinal silhouette is normal.",
"The lungs are clear. The cardiomediastinal silhouette is within normal limits. No acute osseous abnormalities."
]
print("Computing metrics, this can take a while...")
print(json.dumps(compute_scores(["ROUGEL", "bertscore", "radgraph", "BLEU", "chexbert"],
refs=refs,
hyps=hyps,
split=None,
seed=None,
config=None,
epoch=None,
logger=logging.getLogger(__name__),
dump=False),
indent=4)
)
More details of the challenge can be found on the challenge web page or in workshop site