Spaces:
Running
on
Zero
Running
on
Zero
Commit
β’
ecb8a41
1
Parent(s):
ff86a3f
cta
Browse files
app.py
CHANGED
@@ -142,6 +142,7 @@ To make the ColPali models work even better we might want a dataset of query/ima
|
|
142 |
One way in which we might go about generating such a dataset is to use an VLM to generate synthetic queries for us.
|
143 |
This space uses the [Qwen/Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) VLM model to generate queries for a document, based on an input document image.
|
144 |
|
|
|
145 |
|
146 |
This [blog post](https://danielvanstrien.xyz/posts/post-with-code/colpali/2024-09-23-generate_colpali_dataset.html) gives an overview of how you can use this kind of approach to generate a full dataset for fine-tuning ColPali models.
|
147 |
|
|
|
142 |
One way in which we might go about generating such a dataset is to use an VLM to generate synthetic queries for us.
|
143 |
This space uses the [Qwen/Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) VLM model to generate queries for a document, based on an input document image.
|
144 |
|
145 |
+
**Note** there is a lot of scope for improving to prompts and the quality of the generated queries! If you have any suggestions for improvements please open a Discussion!
|
146 |
|
147 |
This [blog post](https://danielvanstrien.xyz/posts/post-with-code/colpali/2024-09-23-generate_colpali_dataset.html) gives an overview of how you can use this kind of approach to generate a full dataset for fine-tuning ColPali models.
|
148 |
|