s k

madstuntman11

AI & ML interests

None yet

Recent Activity

Organizations

None yet

madstuntman11's activity

upvoted an article 27 days ago
view article
Article

Visual Document Retrieval Goes Multilingual

By marco and 1 other β€’
β€’ 74
upvoted an article 2 months ago
view article
Article

DeepSearch Using Visual RAG in Agentic Frameworks πŸ”Ž

By paultltc and 1 other β€’
β€’ 33
reacted to merve's post with ❀️ 7 months ago
view post
Post
3893
If you have documents that do not only have text and you're doing retrieval or RAG (using OCR and LLMs), give it up and give ColPali and vision language models a try πŸ€—

Why? Documents consist of multiple modalities: layout, table, text, chart, images. Document processing pipelines often consist of multiple models and they're immensely brittle and slow. πŸ₯²

How? ColPali is a ColBERT-like document retrieval model built on PaliGemma, it operates over image patches directly, and indexing takes far less time with more accuracy. You can use it for retrieval, and if you want to do retrieval augmented generation, find the closest document, and do not process it, give it directly to a VLM like Qwen2-VL (as image input) and give your text query. 🀝

This is much faster + you do not lose out on any information + much easier to maintain too! πŸ₯³

Multimodal RAG merve/multimodal-rag-66d97602e781122aae0a5139 πŸ’¬
Document AI (made it way before, for folks who want structured input/output and can fine-tune a model) merve/awesome-document-ai-65ef1cdc2e97ef9cc85c898e πŸ“–
  • 2 replies
Β·
reacted to merve's post with ❀️ 7 months ago
view post
Post
5662
I have put together a notebook on Multimodal RAG, where we do not process the documents with hefty pipelines but natively use:
- vidore/colpali for retrieval πŸ“– it doesn't need indexing with image-text pairs but just images!
- Qwen/Qwen2-VL-2B-Instruct for generation πŸ’¬ directly feed images as is to a vision language model with no processing to text!
I used ColPali implementation of the new 🐭 Byaldi library by @bclavie πŸ€—
https://github.com/answerdotai/byaldi
Link to notebook: https://github.com/merveenoyan/smol-vision/blob/main/ColPali_%2B_Qwen2_VL.ipynb
upvoted an article 7 months ago
upvoted an article 8 months ago
view article
Article

πŸ€— PEFT welcomes new merging methods

By smangrul and 1 other β€’
β€’ 19
upvoted an article 10 months ago
view article
Article

ColPali: Efficient Document Retrieval with Vision Language Models πŸ‘€

By manu β€’
β€’ 253