Join the conversation
Join the community of Machine Learners and AI enthusiasts.
Sign UpInterested too, there's no OCR leaderboard?
@HakunaMatata1997
hello!
I think on top of my head I can't think of an OCR model specifically, I was mostly using easyocr. OCR is a problem that is pretty much solved, so most of the AI work around docs are focused on understanding documents (because it's more than image -> text, it involves text, charts, tables, whole layout and more)
if you really want OCR there are models like https://huggingface.co/facebook/nougat-base that is for PDF to markdown for instance.
I can also recommend some for document understanding in general (which works on text + chart + image + layout) zero shot or as a backbone to finetune.
@merve more particularly if i say, something like understanding text good enough in images so the response are accurate enough from VLM
If you both need the model to be able to do some difficult reasoning about the information on the image, and you want the text on the image to be output as is:
QwenVL-Base, MiniCPM-Llama3-V-2_5, Fuyu-8B
And here are some good OCR-related leaderboards, on which you can also find a lot of very strong models.
For example, OCRBench converts a lot of proprietary OCR-era review(2 stages) into an end-to-end model format.
I recently came across one called reka-vibe-eval, which asks many questions about rich documents.
If you want to model only do ocr, I think you can try the paddle series