Finetuning script for Phi3.5-Vision

#24

by 2U1 - opened Sep 11, 2024

2U1

Sep 11, 2024

https://github.com/2U1/Phi3-Vision-Finetune

I made a fine-tuning script for Phi3.5-Vision. It supprots single-image, multi-image and video dataset.
You can select each module (Vision, LLM, Projector) for fine tuning and set different learning rate for all.

Feedback and issues are welcome!

adnanPBI

Oct 30, 2024

can you share the fine-tuning script?? I want to train this model to recognize text, emojis as well as corresponding layout.

2U1

Oct 30, 2024

@adnanPBI You can visit the repo and use it !

dutta18

Dec 17, 2024

Can it be used for VQA tasks ?

2U1

Dec 18, 2024

@dutta18 Yes it could be use for VQA tasks.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment