Compatibility with olmOCR repo

by pszemraj - opened Apr 5

Apr 5

•

Great work! Since you mention this is a "drop in replacement", can I "drop it in" to https://github.com/allenai/olmocr with the --model arg for python -m olmocr.pipeline? Figured I'd ask before trying as you mention changes to the amounts of metadata it wants to see, etc.

edit: I know you provide an example with vLLM but this would require rebuilding olmocr.pipeline to have a CLI script I can point at a directory of PDF files

yifei-reducto

Reducto org Apr 6

Great work! Since you mention this is a "drop in replacement", can I "drop it in" to https://github.com/allenai/olmocr with the --model arg for python -m olmocr.pipeline? Figured I'd ask before trying as you mention changes to the amounts of metadata it wants to see, etc.

edit: I know you provide an example with vLLM but this would require rebuilding olmocr.pipeline to have a CLI script I can point at a directory of PDF files

Hi @pszemraj , the model should mostly be compatible with olmocr pipeline, but with some tweaks: the prompt is different (you might want to modify this: https://github.com/allenai/olmocr/blob/main/olmocr/prompts/prompts.py), and the model arch is now Qwen2.5-vl instead of Qwen2.0-vl. The rest of it should be the same.

CREET01

Apr 7

Any follow up to this would be greatly appreciated @pszemraj @yifei-reducto

pszemraj

Apr 7

•

edited Apr 7

thanks @yifei-reducto ! In the meantime I tried using the model with the original pipeline.py with some updates such as manually forcing the prompts to be the same as the ones you specify, etc. I ran into some strange issues even after inference 'worked' like wild hallucinations/repeats etc, so I abandoned the original pipeline code/sglang and opted for your vLLM approach.

I workshopped async_pipeline.py in this gist with gemini-2.5 and it seems to work pretty well for batch inference.

Don't quote me on this, but maybe even an order of magnitude faster than what I saw with the original (olmOCR) inference code.

Quick overview of the process:

ensure you have vllm, flash-attn, other deps installed as needed (see script). flashinfer is nice to have but how to get it to install is out of scope here lol
serve the model locally in a separate tmux/screen/terminal with vllm serve reducto/RolmOCR
after the endpoint is ready run python async_pipeline.py --input_dir ./directory-of-pdfs (output dir inferred/named based on input dir, or pass --output_dir ./out)

PDFs are converted to images which are fired off async in batches of --concurrency_limit for fast vLLM inference. Can''t claim the code to be fully optimal, but it works well enough based on my tests - hope this helps anyone reading!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment