@hakunamatata1997 on Hugging Face: "Can someone suggest me a good open source vision model which performs good at…"

Back to feed

hakunamatata1997

posted an update May 22, 2024

Post

1053

Can someone suggest me a good open source vision model which performs good at OCR?

victor

May 22, 2024

Interested too, there's no OCR leaderboard?

osanseviero

May 22, 2024

cc @merve likely nows

merve

May 22, 2024

@HakunaMatata1997 hello!
I think on top of my head I can't think of an OCR model specifically, I was mostly using easyocr. OCR is a problem that is pretty much solved, so most of the AI work around docs are focused on understanding documents (because it's more than image -> text, it involves text, charts, tables, whole layout and more)
if you really want OCR there are models like https://huggingface.co/facebook/nougat-base that is for PDF to markdown for instance.
I can also recommend some for document understanding in general (which works on text + chart + image + layout) zero shot or as a backbone to finetune.

hakunamatata1997

May 23, 2024

@merve more particularly if i say, something like understanding text good enough in images so the response are accurate enough from VLM

PKaushik

May 22, 2024

You can check this version as well https://huggingface.co/spaces/mindee/doctr

Cuiunbo

May 22, 2024

If you both need the model to be able to do some difficult reasoning about the information on the image, and you want the text on the image to be output as is:
QwenVL-Base, MiniCPM-Llama3-V-2_5, Fuyu-8B

And here are some good OCR-related leaderboards, on which you can also find a lot of very strong models.
For example, OCRBench converts a lot of proprietary OCR-era review(2 stages) into an end-to-end model format.
I recently came across one called reka-vibe-eval, which asks many questions about rich documents.

Cuiunbo

May 22, 2024

If you want to model only do ocr, I think you can try the paddle series

Join the conversation