Image-Text-to-Text
Safetensors
pmod_llava_llama
File size: 1,221 Bytes
7407a8a
 
 
 
 
f81c25a
7407a8a
 
e1586ac
7407a8a
 
 
 
 
 
77782f3
 
 
 
 
 
 
 
 
7407a8a
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
---
license: apache-2.0
base_model:
- lmsys/vicuna-7b-v1.5
- openai/clip-vit-large-patch14-336
pipeline_tag: image-text-to-text
---
# p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay
This is the official model checkpoint of [p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay](https://arxiv.org/abs/2412.04449). 
Please refer to [this repository](https://github.com/MCG-NJU/p-MoD) for our code.

## Model Description
This model is pretrained on [LCS-558K](https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain) image caption data, and instruction-tuned on [llava-v1_5-mix-665k](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K/blob/main/llava_v1_5_mix665k.json).

## Citation
If you find our work helpful for your research and applications, please cite our paper:
```Bibtex
@article{zhang2024pmod,
  title={p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay},
  author={Zhang, Jun and Meng, Desen and Qi, Ji and Huang, Zhenpeng and Wu, Tao and Wang, Limin},
  journal={arXiv preprint arXiv:2412.04449},
  year={2024}
}
```

## License
Llama 2 is licensed under the LLAMA 2 Community License, 
Copyright (c) Meta Platforms, Inc. All Rights Reserved.