Olympus: A Universal Task Router for Computer Vision Tasks (CVPR 2025)
Official implementation of "Olympus: A Universal Task Router for Computer Vision Tasks"
โฅ๏ธ If you find our project is helpful for your research, please kindly give us a ๐ on https://github.com/yuanze-lin/Olympus and cite our paper ๐
๐ฃ News
- Release the code for integration with task-specific models.
- Release the training & inference code.
- Release Olympus datasets.
- Release the model of Olympus.
๐ Overview
Getting Started
๐ ๏ธ Environment Installation
To establish the environment, just run this code in the shell:
git clone https://github.com/yuanze-lin/Olympus.git
cd Olympus
conda create -n olympus python==3.10 -y
conda activate olympus
pip install -r requirements.txt
That will create the environment olympus
we used.
Download Models & Data
We share our collected Olympus dataset as follows:
Instruction | Link |
---|---|
Olympus Task-wise Data | Olympus_20tasks_all |
Olympus Fine-tuning Data | Olympus.json |
Olympus_20tasks_all
: There are 20 JSON files under20 individual tasks
folder, each corresponding to a specific task. You can refer to the routing token definitions in our paper to identify the task associated with each JSON file, along with the chain-of-action data provided incoa.json
. Each of these 21 JSON files includes both training and test data.Olympus.json
: The final fine-tuning data.
(1) Download the Olympus model:
python download_olympus.py
It will save the Olympus
model under the ckpts
folder.
(2) Download the Olympus data for fine-tuning:
python download_olympus_json.py
The json data will be saved as Olympus.json
in the train_data
folder. Note that Olympus.json
includes llava_v1_5_mix665k.json
combined with our collected data from 20 tasks.
If you want to merge the data manually, firstly create jsons
folder by mkdir jsons
, download all the JSON files from Olympus_20tasks_all and llava_v1_5_mix665k.json into the jsons
folder, then run the merge script:
python scripts/merge_data.py
(3) Download the Mipha-3B model for fine-tuning:
python download_mipha_3b.py
It will save the Mipha-3B
model under the ckpts
folder.
Inference
Run the following code for inference:
model_name=Olympus
MODELDIR=ckpts/$model_name
python predict.py \
--prompt "Generate an image of a fluffy orange cat lounging on a windowsill, \
with sunlight streaming through the glass and casting soft shadows to create a cozy atmosphere. \
Next, would it be possible to change the cat's color to white? This change will make it more eye-catching. \
In the following step, produce a high-resolution 3D model based on the modified image. \
At the next point, please show a video of a cat and a dog running on a playground." \
--model-path $MODELDIR \
--temperature 0 \
--conv-mode v0
Alternatively, you can run bash predict.sh
as we did.
The prediction should be like:
Input Prompt: Generate an image of a fluffy orange cat lounging on a windowsill,
with sunlight streaming through the glass and casting soft shadows to create a cozy atmosphere.
Next, would it be possible to change the cat's color to white? This change will make it more eye-catching.
In the following step, produce a high-resolution 3D model based on the modified image.
At the next point, please show a video of a cat and a dog running on a playground.
Output: <image_gen>a fluffy orange cat lounging on a windowsill, with sunlight streaming
through the glass and casting soft shadows to create a cozy atmosphere.</image_gen>
<image_edit>change the cat's color to white.</image_edit>
<3D_gen_image>produce a high-resolution 3D model based on the modified image.</3D_gen_image>
<video_gen>a cat and a dog running on a playground.</video_gen>
Change the --prompt
to customize the input prompt as needed.
Visual Instruction Tuning
Please refer here to prepare the instruction tuning data. Especially, store the images from different datasets under train_data
folder.
Run the following code to fine-tune the model:
bash scripts/mipha/finetune.sh
Evaluation
To evaluate the model's performance on different benchmarks:
See Evaluation.md.
Please place the evaluation data under the eval
folder. The evaluation scripts are placed under scripts/mipha/eval/
.
For example, to test the model's performance on VQAv2 dataset, simply run:
bash scripts/mipha/eval/vqav2.sh
๐ฎ Suppored Capacities (Covering 20 tasks)
๐ Diverse Applications
Citation
If you find Olympus useful for your research and applications, please cite using this BibTeX:
@article{lin2024olympus,
title={Olympus: A Universal Task Router for Computer Vision Tasks},
author={Lin, Yuanze and Li, Yunsheng and Chen, Dongdong and Xu, Weijian and Clark, Ronald and Torr, Philip HS},
journal={arXiv preprint arXiv:2412.09612},
year={2024}
}
Acknowledgement
Our project is built upon the following foundations:
- Downloads last month
- 30
Model tree for Yuanze/Olympus
Base model
zhumj34/Mipha-3B