RaviNaik commited on
Commit
d4a17ce
·
verified ·
1 Parent(s): e60d2e0

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +58 -0
README.md ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - liuhaotian/LLaVA-Instruct-150K
5
+ - liuhaotian/LLaVA-Pretrain
6
+ language:
7
+ - en
8
+ pipeline_tag: visual-question-answering
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ This is a multimodal implementation of [Phi2](https://huggingface.co/microsoft/phi-2) model inspired by [LlaVA-Phi](https://github.com/zhuyiche/llava-phi).
14
+
15
+ ## Model Details
16
+ 1. LLM Backbone: [Phi2](https://huggingface.co/microsoft/phi-2)
17
+ 2. Vision Tower: [clip-vit-large-patch14-336](https://huggingface.co/openai/clip-vit-large-patch14-336)
18
+ 4. Pretraining Dataset: [LAION-CC-SBU dataset with BLIP captions(200k samples)](https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain)
19
+ 5. Finetuning Dataset: [Instruct 150k dataset based on COCO](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K)
20
+ 6. Finetuned Model: [RaviNaik/Llava-Phi2](https://huggingface.co/RaviNaik/Llava-Phi2)
21
+
22
+
23
+ ### Model Sources [optional]
24
+
25
+ <!-- Provide the basic links for the model. -->
26
+
27
+ - **Original Repository:** [Llava-Phi](https://github.com/zhuyiche/llava-phi)
28
+ - **Paper [optional]:** [LLaVA-Phi: Efficient Multi-Modal Assistant with Small Language Model](https://arxiv.org/pdf/2401.02330)
29
+ - **Demo [optional]:** [Demo Link](https://huggingface.co/spaces/RaviNaik/MultiModal-Phi2)
30
+
31
+
32
+ ## How to Get Started with the Model
33
+
34
+ Use the code below to get started with the model.
35
+ 1. Clone this repository and navigate to llava-phi folder
36
+ ```bash
37
+ git clone https://github.com/zhuyiche/llava-phi.git
38
+ cd llava-phi
39
+ ```
40
+ 2. Install Package
41
+ ```bash
42
+ conda create -n llava_phi python=3.10 -y
43
+ conda activate llava_phi
44
+ pip install --upgrade pip # enable PEP 660 support
45
+ pip install -e .
46
+ ```
47
+ 3. Run the Model
48
+ ```bash
49
+ python llava_phi/eval/run_llava_phi.py --model-path="RaviNaik/Llava-Phi2" \
50
+ --image-file="https://huggingface.co/RaviNaik/Llava-Phi2/resolve/main/people.jpg?download=true" \
51
+ --query="How many people are there in the image?"
52
+ ```
53
+
54
+ ### Acknowledgement
55
+ This implementation is based on wonderful work done by: \
56
+ [LlaVA-Phi](https://github.com/zhuyiche/llava-phi) \
57
+ [Llava](https://github.com/haotian-liu/LLaVA) \
58
+ [Phi2](https://huggingface.co/microsoft/phi-2)