RaviNaik
/

Llava-Phi2

Image-Text-to-Text

text-generation

Inference Endpoints

Model card Files Files and versions Community

RaviNaik commited on Jan 26, 2024

Commit

d4a17ce

·

verified ·

1 Parent(s): e60d2e0

Create README.md

Files changed (1) hide show

README.md +58 -0

README.md ADDED Viewed

	@@ -0,0 +1,58 @@

+---
+license: mit
+datasets:
+- liuhaotian/LLaVA-Instruct-150K
+- liuhaotian/LLaVA-Pretrain
+language:
+- en
+pipeline_tag: visual-question-answering
+---
+# Model Card for Model ID
+This is a multimodal implementation of [Phi2](https://huggingface.co/microsoft/phi-2) model inspired by [LlaVA-Phi](https://github.com/zhuyiche/llava-phi).
+## Model Details
+1. LLM Backbone: [Phi2](https://huggingface.co/microsoft/phi-2)
+2. Vision Tower: [clip-vit-large-patch14-336](https://huggingface.co/openai/clip-vit-large-patch14-336)
+4. Pretraining Dataset: [LAION-CC-SBU dataset with BLIP captions(200k samples)](https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain)
+5. Finetuning Dataset: [Instruct 150k dataset based on COCO](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K)
+6. Finetuned Model: [RaviNaik/Llava-Phi2](https://huggingface.co/RaviNaik/Llava-Phi2)
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Original Repository:** [Llava-Phi](https://github.com/zhuyiche/llava-phi)
+- **Paper [optional]:** [LLaVA-Phi: Efficient Multi-Modal Assistant with Small Language Model](https://arxiv.org/pdf/2401.02330)
+- **Demo [optional]:** [Demo Link](https://huggingface.co/spaces/RaviNaik/MultiModal-Phi2)
+## How to Get Started with the Model
+Use the code below to get started with the model.
+1. Clone this repository and navigate to llava-phi folder
+```bash
+git clone https://github.com/zhuyiche/llava-phi.git
+cd llava-phi
+```
+2. Install Package
+```bash
+conda create -n llava_phi python=3.10 -y
+conda activate llava_phi
+pip install --upgrade pip  # enable PEP 660 support
+pip install -e .
+```
+3. Run the Model
+```bash
+python llava_phi/eval/run_llava_phi.py --model-path="RaviNaik/Llava-Phi2" \
+    --image-file="https://huggingface.co/RaviNaik/Llava-Phi2/resolve/main/people.jpg?download=true" \
+    --query="How many people are there in the image?"
+```
+### Acknowledgement
+This implementation is based on wonderful work done by: \
+[LlaVA-Phi](https://github.com/zhuyiche/llava-phi) \
+[Llava](https://github.com/haotian-liu/LLaVA) \
+[Phi2](https://huggingface.co/microsoft/phi-2)