wayneicloud
/

VPP-LLaVA-7b

Image-Text-to-Text

Model card Files Files and versions

wayneicloud commited on Jul 4

Commit

9fc5b1c

·

verified ·

1 Parent(s): e73738c

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -7,7 +7,7 @@ license: apache-2.0
 **Model Type**: VPP-LLaVA is an enhanced multimodal model built upon the LLaVA architecture. It is designed to improve visual grounding capabilities by incorporating Visual Position Prompts (VPP) into the original LLaVA model. LLaVA itself is an open-source chatbot trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal instruction-following data. It is an auto-regressive language model based on the transformer architecture.
-**Model Date**: The VPP-LLaVA enhancements were developed and tested based on the LLaVA-v1.5-7B model, which was trained in Feb. 2025.
 **Paper or Resources for More Information**:
 - Original LLaVA: [LLaVA: Large Language and Vision Assistant](https://llava-vl.github.io/)

 **Model Type**: VPP-LLaVA is an enhanced multimodal model built upon the LLaVA architecture. It is designed to improve visual grounding capabilities by incorporating Visual Position Prompts (VPP) into the original LLaVA model. LLaVA itself is an open-source chatbot trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal instruction-following data. It is an auto-regressive language model based on the transformer architecture.
+**Model Date**: The VPP-LLaVA-7b enhancements were developed and tested based on the LLaVA-v1.5-7B model, which was trained in Feb. 2025.
 **Paper or Resources for More Information**:
 - Original LLaVA: [LLaVA: Large Language and Vision Assistant](https://llava-vl.github.io/)