Excited to bring the new models that are performing exceptionally well in document OCR, image captioning, and visual understanding tasks. Megalodon-OCR and Perseus-Doc-VL have both demonstrated significant improvements across key areas. You can explore live demos on Hugging Face Spaces to compare their performance with other top-tier models available on the hub. 🤗📄
Demo of OCR & Math QA using multi-capable VLMs like MonkeyOCR-pro-1.2B, R1-One-Vision, VisionaryR1, Vision Matters-7B, and VIGAL-7B, all running together with support for both image and video inference. 🪐