Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion Paper • 2412.04424 • Published Dec 5, 2024 • 64
view article Article ColFlor: Towards BERT-Size Vision-Language Document Retrieval Models By ahmed-masry • Oct 18, 2024 • 20
Loong: Generating Minute-level Long Videos with Autoregressive Language Models Paper • 2410.02757 • Published Oct 3, 2024 • 37
Qwen2.5 Collection Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 46 items • Updated 4 days ago • 628
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Paper • 2409.12191 • Published Sep 18, 2024 • 78
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model Paper • 2409.01704 • Published Sep 3, 2024 • 84
view article Article Making LLMs lighter with AutoGPTQ and transformers By marcsun13 and 5 others • Aug 23, 2023 • 57
Awesome Document AI Collection A collection of open-source document AI 📄 📝 📈 • 27 items • Updated Mar 11, 2024 • 80
Qwen2-VL Collection Vision-language model series based on Qwen2 • 16 items • Updated 4 days ago • 221
Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models Paper • 2408.02442 • Published Aug 5, 2024 • 21
view article Article Introducing TextImage Augmentation for Document Images By danaaubakirova and 2 others • Aug 6, 2024 • 33
view article Article A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes By ybelkada and 1 other • Aug 17, 2022 • 101