MarkoVidrih (Marko Vidrih)

replied to their post over 1 year ago

https://huggingface.co/spaces/merve/UDOP 👀

replied to their post over 1 year ago

Explore pre-trained VLM models like Qwen-VL (https://github.com/QwenLM/Qwen-VL), DeepSeek-VL (https://github.com/deepseek-ai/DeepSeek-VL) or maybe even LayoutLM or VLoc https://arxiv.org/pdf/2304.06447 that are specifically designed for document understanding tasks. Also, consider using transfer learning by fine-tuning a pre-trained VLM on your FIR dataset for improved performance.

replied to their post over 1 year ago

Try layout-aware transformers or attention models that can handle both textual and visual features. Train the VLM on prepared dataset. This involves feeding the FIR images and corresponding IPC section labels into the model. The model will learn to identify the visual cues that indicate the IPC section within the FIRs.

Once trained, you can use the VLM to process new FIRs. The model will analyze the layout and text within the FIR image and predict the location of the IPC section. You can then extract the predicted region of the FIR and parse the text to obtain the IPC codes.

replied to their post over 1 year ago

Sure. What do you want to know?

posted an update over 1 year ago

Post

Hi everyone!

Looking forward to engaging with you all 🤗

7 replies

·

Marko Vidrih PRO

AI & ML interests

Organizations

Marko Vidrih PRO

AI & ML interests

Organizations

MarkoVidrih's activity