Image-Text-to-Text
Transformers
ONNX
Safetensors
English
idefics3
conversational

Example usage with remote vLLM

#45
by kpierzynski - opened

Hello,

I need to host this model via remote vLLM instance. To do that, I'm using following script, problem is that I'm not getting as output docling format but this:

1>185>43>316>51>Topic I - Introduction>
1>236>56>252>64>§ 11>
...

Questions: Is this vLLM problem? I'm lack of some sort of processing? Is code just bad? This can't be done with remote vLLM? I tried using DocTagsDocument and DoclingDocument but those returned me empty outputs.

Thanks!

import base64

from PIL import Image
from openai import OpenAI

with open("example.png", "rb") as f:
    image_base64 = base64.b64encode(f.read()).decode("utf-8")

image = Image.open("example.png")

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="x"
)

PROMPT_TEXT = "Convert this page to docling."

response = client.chat.completions.create(
    model="ds4sd/SmolDocling-256M-preview",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{image_base64}"}},
                {"type": "text", "text": f"{PROMPT_TEXT}"},
            ],
        }
    ],
    max_tokens=8192 - 17,
    temperature=0.0,
)

doctags = response.choices[0].message.content.strip()
print(doctags)

I think you are skipping the special tokens which are required for the doctags.

Thanks for answer, you are right, 'skip_special_tokens' solved the problem.

kpierzynski changed discussion status to closed

Sign up or log in to comment