---
title: ColPali Visual Retrieval
emoji: 🔍
colorFrom: green
colorTo: blue
sdk: docker
sdk_version: "3.11"
app_file: app.py
pinned: false
---

# ColPali Visual Retrieval with Vespa

A powerful visual document retrieval system that combines **ColPali** (Contextual Late Interaction with Patch-level Information) with **Vespa** for scalable, intelligent document search and question-answering.

## 🌟 Features

- **Visual Document Search**: Search through PDF documents using natural language queries
- **Token-level Similarity Maps**: Visualize exactly which parts of documents match your query
- **AI-Powered Chat**: Ask questions about retrieved documents using Google Gemini
- **Multiple Ranking Methods**: Choose between ColPali, BM25, or Hybrid ranking

## 🚀 Try It Out

1. Enter a natural language query in the search box
2. Select your preferred ranking method
3. Click on token buttons to see visual attention maps
4. Ask follow-up questions in the chat interface

## 📄 Sample Queries

- "Pie chart with model comparison"
- "Speaker diarization evaluation"
- "Results table from dense retrieval"
- "Graph showing training loss"
- "Architecture diagram with transformer"

## 🛠️ Technology Stack

- **ColPali**: Visual-language model for document understanding
- **Vespa**: Distributed search engine for scalability
- **FastHTML**: Modern web framework for the UI
- **Google Gemini**: AI-powered question answering

## 📊 About the Dataset

This demo uses ~400 pages from AI-related research papers published in 2024. The documents are processed using ColPali to create visual embeddings that enable semantic search across document images.

## 🔗 Links

- [ColPali Paper](https://arxiv.org/abs/2404.09317)
- [Vespa Documentation](https://docs.vespa.ai/)
- [Blog Post](https://blog.vespa.ai/visual-retrieval-with-colpali-and-vespa/)
- [GitHub Repository](https://github.com/vespa-engine/vespa/tree/master/examples/colpali-visual-retrieval)