File size: 4,974 Bytes
33818a7 919f84c 33818a7 919f84c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 |
---
license: apache-2.0
base_model:
- liuhaotian/llava-v1.5-7b
---
# LISA++ (LISA_Plus_7b): An Improved Baseline for Reasoning Segmentation with Large Language Model
🤗[Data](https://huggingface.co/collections/Senqiao/lisa-67713837a32d6abf516a162e) | 📄[Paper](https://arxiv.org/abs/2312.17240)
# Model Card for LISA++ (LISA_Plus_7b)
## Model Details
- **Developed by**: Senqiao Yang, The Chinese University of Hong Kong & SmartMore
- **Model Type**: Large Vision-Language Model (VLM) for reasoning segmentation
- **Language(s)**: Supports natural language queries in English
- **License**: Apache 2.0
- **Base Model**: Finetuned from [liuhaotian/llava-v1.5-7b](https://huggingface.co/liuhaotian/llava-v1.5-7b)
## Model Description
LISA++ (LISA_Plus_7b) is an improved baseline for reasoning segmentation with large language models. It enhances the capabilities of its predecessor by incorporating instance segmentation and enabling more natural, multi-turn dialogues through Segmentation in Dialogue (SiD). These advancements are achieved without structural changes or additional data sources, relying instead on curated samples from existing segmentation datasets.
### Key Enhancements:
1. **Instance Segmentation**: Differentiates between different instances of the same category, providing more detailed scene analysis alongside existing multi-region semantic segmentation.
2. **Segmentation in Dialogue (SiD)**: Improved capability for multi-turn dialogue, allowing the model to incorporate segmentation results directly into text responses, leading to more natural and flexible conversations.
3. **Refined Data Curation**: Uses datasets like COCO and ADE20K to improve segmentation and dialogue integration.
## Intended Uses & Limitations
### Direct Use
- Interactive image understanding and segmentation
- Multi-turn reasoning about segmented objects in images
- Visual question-answering with spatial awareness
### Out-of-Scope Use
- Real-time medical or security applications without further validation
- Applications requiring precise 3D object segmentation
## How to Use
As of now, the model is not available via the Hugging Face Inference API. To use locally:
```python
from transformers import pipeline
# Load LISA++
model = pipeline("image-segmentation", model="LISA_Plus_7b")
# Example usage
image_path = "example.jpg"
query = "Highlight all the cats in the image."
result = model(image_path, query)
print(result)
```
For further details, refer to the [model repository](https://huggingface.co/Senqiao/LISA_Plus_7b).
## Training Data
LISA++ is trained on curated samples from:
- **COCO Dataset**: Common Objects in Context
- **ADE20K Dataset**: Scene parsing dataset
- **Extended ReasonSeg Dataset**: Enhanced for multi-target instance segmentation
The training data is structured to improve segmentation and dialogue capabilities.
## Training Procedure
- **Base Model**: Finetuned from [liuhaotian/llava-v1.5-7b](https://huggingface.co/liuhaotian/llava-v1.5-7b)
- **Optimizer**: [Specify optimizer, e.g., AdamW]
- **Training Steps**: Trained on ReasonSeg-Inst and ReasonSeg-Sem datasets
- **Hardware**: Trained on GPUs [Specify model, e.g., NVIDIA A100]
- **Loss Functions**: Combination of segmentation and language modeling losses
## Evaluation Results
LISA++ significantly improves segmentation accuracy compared to its predecessor:
- **ReasonSeg-Inst (Instance Segmentation Performance)**:
- AP50: **34.1%** (vs. 13.7% in LISA-7B)
- AP75: **22.1%** (vs. 6.6% in LISA-7B)
- mAP: **21.5%** (vs. 7.2% in LISA-7B)
- **ReasonSeg-Sem (Semantic Segmentation Performance)**:
- gIoU: **64.2%** (vs. 53.6% in LISA)
- cIoU: **68.1%** (vs. 52.3% in LISA)
These results highlight LISA++'s enhanced capabilities in both instance and semantic segmentation tasks.
## Bias, Risks, and Limitations
- **Bias**: The model's performance is limited by biases in training datasets (COCO, ADE20K).
- **Limitations**: May struggle with unseen object categories or highly cluttered scenes.
- **Ethical Considerations**: Users should verify outputs before deploying in critical applications.
## Environmental Impact
- **Hardware Used**: NVIDIA A100 GPUs (or equivalent)
- **Training Duration**: [Specify training time, if available]
- **Estimated Carbon Emissions**: [Estimate, if available]
## Citation
If you use LISA_Plus_7b in your research, please cite:
```
@article{yang2024lisa++,
title={LISA++: An Improved Baseline for Reasoning Segmentation with Large Language Model},
author={Senqiao Yang},
journal={arXiv preprint arXiv:2312.17240},
year={2024}
}
```
## Contact Information
For questions or feedback, contact:
- **Author**: Senqiao Yang
---
This AI generated model card provides an overview of LISA_Plus_7b's capabilities, training methodology, and evaluation metrics, reflecting the latest updates from the Hugging Face model repository and arXiv paper.
|