Poor performance with simple table extraction task

#34

by hanshupe - opened Mar 30

Mar 30

There is a lot of hype around multimodal models, such SmolDocling, I would like to know if others made a similar experience in practice: While they can do impressive things, they still struggle with table extraction, in cases which are straight-forward for humans.

Attached is a simple example, all I need is a reconstruction of the table as a flat CSV, preserving empty all empty cells correctly. Which open source model is able to do that?

MaxMnemonic

Docling org Mar 30

•

edited Mar 30

Thanks @hanshupe Try Docling (docling.io) and see if your table is properly extracted with it, we use dedicated table reconstruction model there.
However we are working on improving weights for SmolDocling, and results are promising, it should catch up on quality of table structure extraction with dedicated model.

hanshupe

Mar 31

Thanks, yes indeed docling performs much better here. Is there somewhere documented which table reconstruction model is used and OCR mode, is docTR support planned?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment