Great model!
#1
by
Ksgk-fy
- opened
Such a cool model, congratulations on it!
Could you teach me how it's built? Any reference to relevant resources would be greatly appreciated.
sure, it's built similarly to AnyMAL https://arxiv.org/pdf/2309.16058 and LLaVA https://arxiv.org/pdf/2304.08485
if you are interested in building a similar model, the open source LLaVA codebase is a decent start https://github.com/haotian-liu/LLaVA
this model is a pretrained projection module, so just phase 1 of AnyMAL, the dataset is https://huggingface.co/datasets/liuhaotian/LLaVA-CC3M-Pretrain-595K