--- license: apache-2.0 base_model: - lmsys/vicuna-7b-v1.5 pipeline_tag: image-text-to-text --- # FlowCut: Rethinking Redundancy via Information Flow for Efficient Vision-Language Models Jintao Tong1, Wenwei Jin2, Pengda Qin2, Anqi Li3, Yixiong Zou1✉ Yuhong Li2✉, Yuhua Li1, Ruixuan Li1

1School of Computer Science and Technology, Huazhong University of Science and Technology
2Xiaohongshu Inc., 3Institute of Information Science, Beijing Jiaotong University [![GitHub](https://img.shields.io/badge/Github-181717?logo=github&logoColor=white)](https://github.com/TungChintao/FlowCut) [![arXiv](https://img.shields.io/badge/arXiv-2505.19536-AD1C18.svg?logo=arXiv)](https://arxiv.org/pdf/2505.19536) [![License](https://img.shields.io/badge/📃%20License-Apache_2.0-yellow.svg)](https://github.com/TungChintao/FlowCut/blob/main/LICENSE) ## 💡 Highlights > **TLDR:** To address inefficiency from excessive visual tokens in LVLMs, we propose a unified, bottom-up perspective based on information-flow, revealing dynamic redundancy emergence and introduce FlowCut, making pruning decision aligned with the model's inherent behavior, outperforming all existing approaches. ## 🛠 Preparation Our code is easy to use. 1. Clone the [LLaVA](https://github.com/haotian-liu/LLaVA)'s repository. ``` git clone https://github.com/haotian-liu/LLaVA.git cd LLaVA ``` 2. Install the [LLaVA](https://github.com/haotian-liu/LLaVA)'s environment. ``` conda create -n llava python=3.10 -y conda activate llava pip install --upgrade pip pip install -e . pip install flash-attn --no-build-isolation ``` 3. For formal usage, you can install the package from PyPI by running the following command: ``` pip install flowcut ``` For development, you can install the package by cloning the repository and running the following command: ``` git clone https://github.com/TungChintao/FlowCut cd flowcut pip install -e . ``` File organization as follow: ``` ├── LLaVA-main ├── flowcut ├── llava ├── playground ├── script ``` ## 🚀 Quick Start ```Python from llava.model.builder import load_pretrained_model from llava.mm_utils import get_model_name_from_path from llava.eval.run_llava import eval_model from flowcut import flowcut model_path = "liuhaotian/llava-v1.5-7b" tokenizer, model, image_processor, context_len = load_pretrained_model( model_path=model_path, model_base=None, model_name=get_model_name_from_path(model_path) ) ## FlowCut retains 64 visual tokens model = flowcut(model, target_num=64) ``` ## 📖 Evaluation The evaluation code follows the structure of [LLaVA](https://github.com/haotian-liu/LLaVA) or [Lmms-Eval](https://github.com/EvolvingLMMs-Lab/lmms-eval). After loading the model, simply add two lines as shown below: ```python ## Load LLaVA Model (code from llava.eval.model_vqa_loader) tokenizer, model, image_processor, context_len = load_pretrained_model(model_path, args.model_base, model_name) ## add FlowCut from flowcut import flowcut model = flowcut(model, target_num=64) ``` Script templetes (please follow the detailed instruction in [LLaVA-Evaluation](https://github.com/haotian-liu/LLaVA/blob/main/docs/Evaluation.md)). ```Shell bash scripts/v1_5/eval/[Benchmark].sh ``` Examples: ```Shell CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/mme.sh ``` ```Shell CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 bash scripts/v1_5/eval/vqav2.sh ``` ## 🎯 Training The training code follows the structure of [LLaVA](https://github.com/haotian-liu/LLaVA). After loading the model, simply add two lines as shown below: ```python ## Load LLaVA Model (code from llava.train) code of loading model... ## add FlowCut from flowcut import flowcut model = flowcut(model, target_num=64) ## training trainer = LLaVATrainer(model=model, tokenizer=tokenizer, args=training_args, **data_module) ``` ## 🔑 License - This project is released under the [Apache 2.0 license](https://github.com/TungChintao/FlowCut/blob/main/LICENSE). ## 📌 Citation - If you find this project useful in your research, please consider citing: ```bibtex @article{tong2025flowcut, title={FlowCut: Rethinking Redundancy via Information Flow for Efficient Vision-Language Models}, author={Tong, Jintao and Jin, Wenwei and Qin, Pengda and Li, Anqi and Zou, Yixiong and Li, Yuhong and Li, Yuhua and Li, Ruixuan}, journal={arXiv preprint arXiv:2505.19536}, year={2025} } ```