Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
Edit Models filters
Tasks
1
Libraries
1
Datasets
Languages
Licenses
Other
Reset Tasks
Multimodal
Image-Text-to-Text
Visual Question Answering
Document Question Answering
Video-Text-to-Text
Any-to-Any
Audio-Text-to-Text
Visual Document Retrieval
Computer Vision
Image Classification
Object Detection
Video Classification
Image Segmentation
Image-to-Text
Image Feature Extraction
Zero-Shot Image Classification
Text-to-Image
Depth Estimation
Zero-Shot Object Detection
Image-to-Image
Unconditional Image Generation
Keypoint Detection
Image-to-3D
Text-to-3D
Mask Generation
Text-to-Video
Image-to-Video
Natural Language Processing
Text Generation
Text Classification
Text2Text Generation
Token Classification
Fill-Mask
Feature Extraction
Question Answering
Translation
Sentence Similarity
Summarization
Zero-Shot Classification
Text Ranking
Table Question Answering
Audio
Automatic Speech Recognition
Audio-to-Audio
Audio Classification
Text-to-Audio
Text-to-Speech
Voice Activity Detection
Tabular
Tabular Classification
Time Series Forecasting
Tabular Regression
Reinforcement Learning
Reinforcement Learning
Robotics
Other
Graph Machine Learning
Apply filters
Models
9,157
Full-text search
Edit filters
Sort: Trending
Active filters:
image-text-to-text, transformers
Clear all
XiaomiMiMo/MiMo-VL-7B-RL
Image-Text-to-Text
•
Updated
5 days ago
•
1.91k
•
93
google/medgemma-4b-it
Image-Text-to-Text
•
Updated
14 days ago
•
32.8k
•
318
Hcompany/Holo1-7B
Image-Text-to-Text
•
Updated
about 2 hours ago
•
25
•
49
ByteDance/Dolphin
Image-Text-to-Text
•
Updated
8 days ago
•
3.26k
•
253
Hcompany/Holo1-3B
Image-Text-to-Text
•
Updated
about 2 hours ago
•
184
•
32
google/gemma-3-27b-it
Image-Text-to-Text
•
Updated
Mar 21
•
394k
•
•
1.41k
XiaomiMiMo/MiMo-VL-7B-SFT
Image-Text-to-Text
•
Updated
5 days ago
•
3.62k
•
24
mlabonne/gemma-3-27b-it-abliterated-v2
Image-Text-to-Text
•
Updated
4 days ago
•
245
•
20
Qwen/Qwen2.5-VL-7B-Instruct
Image-Text-to-Text
•
Updated
Apr 6
•
2.56M
•
•
934
google/gemma-3-4b-it
Image-Text-to-Text
•
Updated
Mar 21
•
886k
•
584
nvidia/Cosmos-Reason1-7B
Image-Text-to-Text
•
Updated
8 days ago
•
52.7k
•
89
mlabonne/gemma-3-27b-it-abliterated-v2-GGUF
Image-Text-to-Text
•
Updated
3 days ago
•
3.27k
•
17
meta-llama/Llama-4-Scout-17B-16E-Instruct
Image-Text-to-Text
•
Updated
12 days ago
•
289k
•
•
934
vikhyatk/moondream2
Image-Text-to-Text
•
Updated
Apr 14
•
361k
•
1.15k
ds4sd/SmolDocling-256M-preview
Image-Text-to-Text
•
Updated
19 days ago
•
302k
•
1.41k
xlangai/Jedi-7B-1080p
Image-Text-to-Text
•
Updated
14 days ago
•
743
•
17
mlabonne/gemma-3-27b-it-qat-abliterated
Image-Text-to-Text
•
Updated
6 days ago
•
89
•
12
Qwen/Qwen2.5-VL-3B-Instruct
Image-Text-to-Text
•
Updated
Apr 6
•
2.83M
•
392
google/gemma-3-12b-it
Image-Text-to-Text
•
Updated
Mar 21
•
368k
•
•
392
fancyfeast/llama-joycaption-beta-one-hf-llava
Image-Text-to-Text
•
Updated
19 days ago
•
32.4k
•
97
google/medgemma-4b-pt
Image-Text-to-Text
•
Updated
14 days ago
•
4.24k
•
82
stockmark/Stockmark-2-VL-100B-beta
Image-Text-to-Text
•
Updated
1 day ago
•
184
•
9
ByteDance-Seed/UI-TARS-1.5-7B
Image-Text-to-Text
•
Updated
Apr 18
•
28.5k
•
287
unsloth/medgemma-27b-text-it-GGUF
Image-Text-to-Text
•
Updated
15 days ago
•
15.6k
•
26
mlabonne/gemma-3-12b-it-abliterated-v2-GGUF
Image-Text-to-Text
•
Updated
6 days ago
•
3.41k
•
8
meta-llama/Llama-3.2-11B-Vision-Instruct
Image-Text-to-Text
•
Updated
Dec 4, 2024
•
797k
•
•
1.45k
xlangai/Jedi-3B-1080p
Image-Text-to-Text
•
Updated
14 days ago
•
913
•
9
Qwen/Qwen2.5-VL-72B-Instruct
Image-Text-to-Text
•
Updated
Mar 23
•
194k
•
•
473
google/shieldgemma-2-4b-it
Image-Text-to-Text
•
Updated
Apr 4
•
27.8k
•
105
mlabonne/gemma-3-27b-it-abliterated-GGUF
Image-Text-to-Text
•
Updated
Apr 1
•
14.1k
•
91
Previous
1
2
3
...
100
Next