How to do multimodal multi task finetuning.
#56
by
stephenfernandess
- opened
I am looking into ways where I can push the multimodal limits of the model
I plan on leveraging the multimodal capabilites of Qwen 2.5 Omini and train a model capable of
- OCR
- ASR , AST, timestamp decoding
- translation
while I have a huge large-scale training corpus in all of the tasks to finetune the model
Just wanted to know of there is possibly a way to use the huggingface codebase to fientune the model on all these task at once. perhaps mix and train all of these task together.
just wanted help on how could I train this ? incase anyone had references to any code that would help me do this.