zw123
/

delta2_llava_4_v1_5_7b

Image-Text-to-Text

Model card Files Files and versions Community

delta2_llava_4_v1_5_7b / README.md

zw123's picture

Update README.md

b0666c1 verified 4 months ago

|

history blame contribute delete

1.14 kB

	---
	library_name: llava
	license: cc-by-4.0
	pipeline_tag: image-text-to-text
	---

	[[Paper]](https://arxiv.org/pdf/2501.09446) [[github]](https://github.com/zw615/Double_Visual_Defense)

	A Delta2-LLaVA-v1.5-7B Model that is adversarially visual instruction tuned with LLaVA1.5 data to reach non-robust-VLM helpfulness levels on clean data while being robust on adversarially attacked data. This model is presented in the paper [Double Visual Defense: Adversarial Pre-training and Instruction Tuning for Improving Vision-Language Model Robustness](https://huggingface.co/papers/2501.09446).

	Project page: https://doublevisualdefense.github.io/

	## Release
	These models are released under the Creative Commons Attribution 4.0 license.

	LLNL-DATA-2003001

	## Citation
	If you find this model useful, please consider citing our paper:
	```bibtex
	@article{wang2025double,
	title={Double Visual Defense: Adversarial Pre-training and Instruction Tuning for Improving Vision-Language Model Robustness},
	author={Wang, Zeyu and Xie, Cihang and Bartoldson, Brian and Kailkhura, Bhavya},
	journal={arXiv preprint arXiv:2501.09446},
	year={2025}
	}
	```