--- title: ColPali Visual Retrieval emoji: 🔍 colorFrom: green colorTo: blue sdk: docker sdk_version: "3.11" app_file: app.py pinned: false --- # ColPali Visual Retrieval with Vespa A powerful visual document retrieval system that combines **ColPali** (Contextual Late Interaction with Patch-level Information) with **Vespa** for scalable, intelligent document search and question-answering. ## 🌟 Features - **Visual Document Search**: Search through PDF documents using natural language queries - **Token-level Similarity Maps**: Visualize exactly which parts of documents match your query - **AI-Powered Chat**: Ask questions about retrieved documents using Google Gemini - **Multiple Ranking Methods**: Choose between ColPali, BM25, or Hybrid ranking ## 🚀 Try It Out 1. Enter a natural language query in the search box 2. Select your preferred ranking method 3. Click on token buttons to see visual attention maps 4. Ask follow-up questions in the chat interface ## 📄 Sample Queries - "Pie chart with model comparison" - "Speaker diarization evaluation" - "Results table from dense retrieval" - "Graph showing training loss" - "Architecture diagram with transformer" ## 🛠️ Technology Stack - **ColPali**: Visual-language model for document understanding - **Vespa**: Distributed search engine for scalability - **FastHTML**: Modern web framework for the UI - **Google Gemini**: AI-powered question answering ## 📊 About the Dataset This demo uses ~400 pages from AI-related research papers published in 2024. The documents are processed using ColPali to create visual embeddings that enable semantic search across document images. ## 🔗 Links - [ColPali Paper](https://arxiv.org/abs/2404.09317) - [Vespa Documentation](https://docs.vespa.ai/) - [Blog Post](https://blog.vespa.ai/visual-retrieval-with-colpali-and-vespa/) - [GitHub Repository](https://github.com/vespa-engine/vespa/tree/master/examples/colpali-visual-retrieval)