File size: 4,012 Bytes
d7a064b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 |
---
license: apache-2.0
datasets:
- racineai/OGC_2_vdr-visRAG-colpali
language:
- fr
- en
- de
- es
- it
base_model:
- HuggingFaceTB/SmolVLM-500M-Instruct
---
# Flantier-SmolVLM-500M-dse
A lightweight multimodal vision-language model specialized for technical document retrieval.
## Overview
Flantier-SmolVLM-500M-dse (Document Screenshot Embedding) is a 500M parameter vision-language model designed for efficient retrieval of technical documentation. It directly encodes document screenshots into embeddings, preserving all information including text, images, and layout without requiring separate content extraction.
## Key Features
- **Efficient Retrieval**: Generates document and query embeddings for semantic similarity search
- **Multimodal Understanding**: Processes text, diagrams, charts, and tables in their original layout
- **Lightweight Architecture**: Only 500M parameters, runs on consumer GPUs
- **No Preprocessing Required**: Directly works with document screenshots
## Installation
```bash
pip install transformers accelerate pillow
```
## Usage Example
```python
from PIL import Image
import torch
from transformers import AutoProcessor, AutoModelForVision2Seq
# Load model and processor
processor = AutoProcessor.from_pretrained("racineai/Flantier-SmolVLM-500M-dse")
model = AutoModelForVision2Seq.from_pretrained(
"racineai/Flantier-SmolVLM-500M-dse",
torch_dtype=torch.bfloat16,
device_map="auto"
)
# Load document image
document_image = Image.open("technical_document.jpg")
# Process for document embedding
doc_messages = [
{
"role": "user",
"content": [
{"type": "image"},
{"type": "text", "text": "What is shown in this image?"}
]
},
]
doc_prompt = processor.apply_chat_template(doc_messages, add_generation_prompt=True)
doc_inputs = processor(text=doc_prompt, images=[document_image], return_tensors="pt").to(model.device)
# Generate document embedding
with torch.no_grad():
doc_outputs = model(**doc_inputs, output_hidden_states=True, return_dict=True)
doc_embedding = doc_outputs.hidden_states[-1][:, -1] # Last token embedding
doc_embedding = torch.nn.functional.normalize(doc_embedding, p=2, dim=-1)
# Process query embedding
query = "What are the specifications of this component?"
query_messages = [
{
"role": "user",
"content": [
{"type": "text", "text": query}
]
},
]
query_prompt = processor.apply_chat_template(query_messages, add_generation_prompt=True)
query_inputs = processor(text=query_prompt, return_tensors="pt").to(model.device)
# Generate query embedding
with torch.no_grad():
query_outputs = model(**query_inputs, output_hidden_states=True, return_dict=True)
query_embedding = query_outputs.hidden_states[-1][:, -1] # Last token embedding
query_embedding = torch.nn.functional.normalize(query_embedding, p=2, dim=-1)
# Calculate similarity
similarity = torch.nn.functional.cosine_similarity(query_embedding, doc_embedding)
print(f"Similarity score: {similarity.item():.4f}")
```
## Applications
- **Technical Document Retrieval**: Find relevant documents based on technical queries
- **Technical Support Systems**: Match user questions to relevant documentation
- **Engineering Knowledge Management**: Index and search technical specifications, diagrams, and reports
## Training Methodology
This model was trained using the Document Screenshot Embedding (DSE) approach, which treats document screenshots as a unified input format. This eliminates the need for content extraction preprocessing while preserving all visual and textual information in documents.
## Citation
```
@misc{flantier-smolvlm-dse,
author = {racine.ai},
title = {Flantier-SmolVLM-500M-dse: A Lightweight Document Screenshot Embedding Model},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/racineai/Flantier-SmolVLM-500M-dse}
}
```
## License
This model is released under the Apache 2.0 license. |