icon

Olympus: A Universal Task Router for Computer Vision Tasks (CVPR 2025)

PDF arXiv Project Page Weights Dataset GitHub Code

Official implementation of "Olympus: A Universal Task Router for Computer Vision Tasks"

โ™ฅ๏ธ If you find our project is helpful for your research, please kindly give us a ๐ŸŒŸ on https://github.com/yuanze-lin/Olympus and cite our paper ๐Ÿ“‘

๐Ÿ“ฃ News

  • Release the code for integration with task-specific models.
  • Release the training & inference code.
  • Release Olympus datasets.
  • Release the model of Olympus.

๐Ÿ”… Overview

Overview

Getting Started

๐Ÿ› ๏ธ Environment Installation

To establish the environment, just run this code in the shell:

git clone https://github.com/yuanze-lin/Olympus.git
cd Olympus
conda create -n olympus python==3.10 -y
conda activate olympus
pip install -r requirements.txt

That will create the environment olympus we used.

Download Models & Data

We share our collected Olympus dataset as follows:

Instruction Link
Olympus Task-wise Data Olympus_20tasks_all
Olympus Fine-tuning Data Olympus.json
  • Olympus_20tasks_all: There are 20 JSON files under 20 individual tasks folder, each corresponding to a specific task. You can refer to the routing token definitions in our paper to identify the task associated with each JSON file, along with the chain-of-action data provided in coa.json. Each of these 21 JSON files includes both training and test data.
  • Olympus.json: The final fine-tuning data.

(1) Download the Olympus model:

python download_olympus.py

It will save the Olympus model under the ckpts folder.

(2) Download the Olympus data for fine-tuning:

python download_olympus_json.py

The json data will be saved as Olympus.json in the train_data folder. Note that Olympus.json includes llava_v1_5_mix665k.json combined with our collected data from 20 tasks.

If you want to merge the data manually, firstly create jsons folder by mkdir jsons, download all the JSON files from Olympus_20tasks_all and llava_v1_5_mix665k.json into the jsons folder, then run the merge script:

python scripts/merge_data.py

(3) Download the Mipha-3B model for fine-tuning:

python download_mipha_3b.py

It will save the Mipha-3B model under the ckpts folder.

Inference

Run the following code for inference:

model_name=Olympus
MODELDIR=ckpts/$model_name

python predict.py \
  --prompt "Generate an image of a fluffy orange cat lounging on a windowsill, \
with sunlight streaming through the glass and casting soft shadows to create a cozy atmosphere. \
Next, would it be possible to change the cat's color to white? This change will make it more eye-catching. \
In the following step, produce a high-resolution 3D model based on the modified image. \
At the next point, please show a video of a cat and a dog running on a playground." \
  --model-path $MODELDIR \
  --temperature 0 \
  --conv-mode v0

Alternatively, you can run bash predict.sh as we did.

The prediction should be like:

Input Prompt:  Generate an image of a fluffy orange cat lounging on a windowsill,
with sunlight streaming through the glass and casting soft shadows to create a cozy atmosphere.
Next, would it be possible to change the cat's color to white? This change will make it more eye-catching.
In the following step, produce a high-resolution 3D model based on the modified image.
At the next point, please show a video of a cat and a dog running on a playground.

Output:  <image_gen>a fluffy orange cat lounging on a windowsill, with sunlight streaming
through the glass and casting soft shadows to create a cozy atmosphere.</image_gen>
<image_edit>change the cat's color to white.</image_edit>
<3D_gen_image>produce a high-resolution 3D model based on the modified image.</3D_gen_image>
<video_gen>a cat and a dog running on a playground.</video_gen>

Change the --prompt to customize the input prompt as needed.

Visual Instruction Tuning

Please refer here to prepare the instruction tuning data. Especially, store the images from different datasets under train_data folder.

Run the following code to fine-tune the model:

bash scripts/mipha/finetune.sh

Evaluation

To evaluate the model's performance on different benchmarks:

See Evaluation.md.

Please place the evaluation data under the eval folder. The evaluation scripts are placed under scripts/mipha/eval/. For example, to test the model's performance on VQAv2 dataset, simply run:

bash scripts/mipha/eval/vqav2.sh

๐Ÿ”ฎ Suppored Capacities (Covering 20 tasks)

Capacity

๐Ÿ‚ Diverse Applications

Capacity

Citation

If you find Olympus useful for your research and applications, please cite using this BibTeX:

@article{lin2024olympus,
  title={Olympus: A Universal Task Router for Computer Vision Tasks},
  author={Lin, Yuanze and Li, Yunsheng and Chen, Dongdong and Xu, Weijian and Clark, Ronald and Torr, Philip HS},
  journal={arXiv preprint arXiv:2412.09612},
  year={2024}
}

Acknowledgement

Our project is built upon the following foundations:

  • Mipha: An impressive open-source project for lightweight vision-language assistants
  • LLaVA: A powerful open-source vision-language assistant project
Downloads last month
30
Safetensors
Model size
3.22B params
Tensor type
FP16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Yuanze/Olympus

Base model

zhumj34/Mipha-3B
Finetuned
(1)
this model

Datasets used to train Yuanze/Olympus