7 4 1

Filippo B

Filippo

GenAI, LLMs, VLMs, accelerated computing, information retrieval, workflows orchestration

liked a Space 8 days ago

updated a collection 22 days ago

upvoted a paper 22 days ago

Filippo's activity

liked a Space 8 days ago

The ultimate guide to training LLM on large GPU Clusters

updated a collection 22 days ago

upvoted a paper 22 days ago

upvoted an article 3 months ago

Article

•

Aug 20, 2024

• 13

updated 2 collections 5 months ago

upvoted a paper 5 months ago

reacted to merve's post with 🔥 5 months ago

Post

5616

I have put together a notebook on Multimodal RAG, where we do not process the documents with hefty pipelines but natively use:
- vidore/colpali for retrieval 📖 it doesn't need indexing with image-text pairs but just images!
- Qwen/Qwen2-VL-2B-Instruct for generation 💬 directly feed images as is to a vision language model with no processing to text!
I used ColPali implementation of the new 🐭 Byaldi library by @bclavie 🤗
https://github.com/answerdotai/byaldi
Link to notebook: https://github.com/merveenoyan/smol-vision/blob/main/ColPali_%2B_Qwen2_VL.ipynb