Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Language Model Aug 22, 2023 • 28
view article Article Introducing Idefics2: A Powerful 8B Vision-Language Model for the community Apr 15 • 170
view article Article Multimodal Augmentation for Documents: Recovering “Comprehension” in “Reading and Comprehension” task By danaaubakirova • May 16 • 17
Idefics2 🐶 Collection Idefics2-8B is a foundation vision-language model. In this collection, you will find the models, datasets and demo related to its creation. • 11 items • Updated May 6 • 89
Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset Paper • 2403.09029 • Published Mar 14 • 54
PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter Paper • 2402.10896 • Published Feb 16 • 15
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks Paper • 2312.14238 • Published Dec 21, 2023 • 18
Octopus: Embodied Vision-Language Programmer from Environmental Feedback Paper • 2310.08588 • Published Oct 12, 2023 • 34
Small-scale proxies for large-scale Transformer training instabilities Paper • 2309.14322 • Published Sep 25, 2023 • 19
Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning Paper • 2309.02591 • Published Sep 5, 2023 • 14
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World Paper • 2308.01907 • Published Aug 3, 2023 • 11
Retentive Network: A Successor to Transformer for Large Language Models Paper • 2307.08621 • Published Jul 17, 2023 • 170
OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents Paper • 2306.16527 • Published Jun 21, 2023 • 47