Multi-image

#2
by pbarker - opened

Can this support multi-image?

The model was not trained on any multi-image data, and the preprocessor in this codebase does not currently support interleaved image/text messages.

The model's design does, in principle, allow it to handle multiple images as input by concatenating them into a very long input sequence, so it is still possible to try multi-image input (although it would require tweaking the preprocessor). However we have not experimented with this ourselves.

Would be nice to have such a feature (especially for a multimodal RAG scenario...)

Sign up or log in to comment