|
--- |
|
library_name: llava |
|
license: cc-by-4.0 |
|
pipeline_tag: image-text-to-text |
|
--- |
|
|
|
[[Paper]](https://arxiv.org/pdf/2501.09446) [[github]](https://github.com/zw615/Double_Visual_Defense) |
|
|
|
A Delta2-LLaVA-v1.5-7B Model that is adversarially visual instruction tuned with LLaVA1.5 data to reach non-robust-VLM helpfulness levels on clean data while being robust on adversarially attacked data. This model is presented in the paper [Double Visual Defense: Adversarial Pre-training and Instruction Tuning for Improving Vision-Language Model Robustness](https://huggingface.co/papers/2501.09446). |
|
|
|
Project page: https://doublevisualdefense.github.io/ |
|
|
|
## Release |
|
These models are released under the Creative Commons Attribution 4.0 license. |
|
|
|
LLNL-DATA-2003001 |
|
|
|
## Citation |
|
If you find this model useful, please consider citing our paper: |
|
```bibtex |
|
@article{wang2025double, |
|
title={Double Visual Defense: Adversarial Pre-training and Instruction Tuning for Improving Vision-Language Model Robustness}, |
|
author={Wang, Zeyu and Xie, Cihang and Bartoldson, Brian and Kailkhura, Bhavya}, |
|
journal={arXiv preprint arXiv:2501.09446}, |
|
year={2025} |
|
} |
|
``` |