Poor performance with simple table extraction task
There is a lot of hype around multimodal models, such SmolDocling, I would like to know if others made a similar experience in practice: While they can do impressive things, they still struggle with table extraction, in cases which are straight-forward for humans.
Attached is a simple example, all I need is a reconstruction of the table as a flat CSV, preserving empty all empty cells correctly. Which open source model is able to do that?
Thanks
@hanshupe
Try Docling (docling.io) and see if your table is properly extracted with it, we use dedicated table reconstruction model there.
However we are working on improving weights for SmolDocling, and results are promising, it should catch up on quality of table structure extraction with dedicated model.
Thanks, yes indeed docling performs much better here. Is there somewhere documented which table reconstruction model is used and OCR mode, is docTR support planned?