FS-DAG: Few Shot Domain Adapting Graph Networks for Visually Rich Document Understanding
Abstract
FS-DAG, a modular model architecture, efficiently adapts to diverse document types with few-shot learning for visually rich document understanding in resource-constrained environments.
In this work, we propose Few Shot Domain Adapting Graph (FS-DAG), a scalable and efficient model architecture for visually rich document understanding (VRDU) in few-shot settings. FS-DAG leverages domain-specific and language/vision specific backbones within a modular framework to adapt to diverse document types with minimal data. The model is robust to practical challenges such as handling OCR errors, misspellings, and domain shifts, which are critical in real-world deployments. FS-DAG is highly performant with less than 90M parameters, making it well-suited for complex real-world applications for Information Extraction (IE) tasks where computational resources are limited. We demonstrate FS-DAG's capability through extensive experiments for information extraction task, showing significant improvements in convergence speed and performance compared to state-of-the-art methods. Additionally, this work highlights the ongoing progress in developing smaller, more efficient models that do not compromise on performance. Code : https://github.com/oracle-samples/fs-dag
Community
Few Shot Domain Adapting Graph (FS-DAG), proposes a scalable and efficient model architecture for visually rich document understanding (VRDU) in few-shot settings. FS-DAG leverages domain-specific and language/vision specific backbones within a modular framework to adapt to diverse document types with minimal data. The model is robust to practical challenges such as handling OCR errors, misspellings, and domain shifts, which are critical in real-world deployments. FS-DAG is highly performant with less than 90M parameters, making it well-suited for complex real-world applications for Information Extraction (IE) tasks where computational resources are limited. We demonstrate FS-DAG's capability through extensive experiments for information extraction task, showing significant improvements in convergence speed and performance compared to state-of-the-art methods. Additionally, this work highlights the ongoing progress in developing smaller, more efficient models that do not compromise on performance.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- QID: Efficient Query-Informed ViTs in Data-Scarce Regimes for OCR-free Visual Document Understanding (2025)
- VISTA-OCR: Towards generative and interactive end to end OCR models (2025)
- From Data to Modeling: Fully Open-vocabulary Scene Graph Generation (2025)
- Toward General and Robust LLM-enhanced Text-attributed Graph Learning (2025)
- Integrating Structural and Semantic Signals in Text-Attributed Graphs with BiGTex (2025)
- Vision Graph Prompting via Semantic Low-Rank Decomposition (2025)
- Multimodal graph representation learning for website generation based on visual sketch (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper