Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2503.10582

about 12 hours ago

DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 182
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training

Paper • 2401.00849 • Published Jan 1, 2024 • 17
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 50
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing

Paper • 2311.00571 • Published Nov 1, 2023 • 41

a collection of algorithmic agents for user interfaces/interactions, program synthesis, and robotics

about 5 hours ago

End-to-End Goal-Driven Web Navigation

Paper • 1602.02261 • Published Feb 6, 2016
Learning Language Games through Interaction

Paper • 1606.02447 • Published Jun 8, 2016
Naturalizing a Programming Language via Interactive Learning

Paper • 1704.06956 • Published Apr 23, 2017
Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration

Paper • 1802.08802 • Published Feb 24, 2018

OCTO+: A Suite for Automatic Open-Vocabulary Object Placement in Mixed Reality

Paper • 2401.08973 • Published Jan 17, 2024 • 1
Autoregressive Image Generation with Randomized Parallel Decoding

Paper • 2503.10568 • Published 21 days ago • 8
VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search

Paper • 2503.10582 • Published 21 days ago • 20

VisualWebInstruct

Scaling up MM data

TIGER-Lab/VisualWebInstruct-Recall

Viewer • Updated 19 days ago • 361k • 257 • 3
TIGER-Lab/VisualWebInstruct-Seed

Viewer • Updated 19 days ago • 60.3k • 495 • 16
TIGER-Lab/VisualWebInstruct

Viewer • Updated 11 days ago • 1.91M • 1.54k • 27
VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search

Paper • 2503.10582 • Published 21 days ago • 20

Multimodal Dataset

SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers

Paper • 2407.09413 • Published Jul 12, 2024 • 11
MAVIS: Mathematical Visual Instruction Tuning

Paper • 2407.08739 • Published Jul 11, 2024 • 33
Kvasir-VQA: A Text-Image Pair GI Tract Dataset

Paper • 2409.01437 • Published Sep 2, 2024 • 71
MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct

Paper • 2409.05840 • Published Sep 9, 2024 • 48

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs