Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
merve
's Collections
Multimodal DSE Retrievers
April 11 Releases
March 28 Releases
March 21 Releases
Türkçe VLMler
Feb 14 Releases π
Feb 7 Releases π§£
January 31 Releases π§€
Models, Jan 27
Jan 24 Releases
Jan 17 Releases βοΈ
Jan 10 Releases π¨οΈ
Dec 6 Releases π
Nov 29 Releases π²π²
Nov 22 Releases βοΈ
Nov 15 Releases π
Nov 1 Releases
MIT Talk 31/10 Papers
October 25 Releases
LOTUS πͺ·
New Depth Models
BRAVE Models π¦
Computer Vision Backbones π§©
Image Classification Models πΆ π±
Object Detection Models π₯₯
Image Segmentation Models π
Zero-shot Image Classification Models πΌοΈ
Image-to-Image Models π¨
Video Classification Models πΊ
Image-to-Text Models π
Text-to-Image Models π₯
Foundation Models for Vision π§©
Segment Anything Model
OWL-series π¦
SigLIP
Awesome Document AI
SegGPT
Vision Language Models Papers πΌοΈπ¬π
gvhf/owl
gv-hf/owl
merve/owl2
Depth Anything v2 Release
Document VLM Papers
Vision Language Leaderboards
Video Language Models
SAM2
NVEagle
Multimodal RAG
Zero-shot Segmentation
Multimodal RAG
updated
Sep 5, 2024
Upvote
27
+17
vidore/colpali-v1.2
Visual Document Retrieval
β’
Updated
Mar 14
β’
69.8k
β’
106
Qwen/Qwen2-VL-7B-Instruct
Image-Text-to-Text
β’
Updated
Feb 6
β’
1.08M
β’
β’
1.18k
Qwen/Qwen2-VL-2B-Instruct
Image-Text-to-Text
β’
Updated
Jan 12
β’
695k
β’
415
Qwen/Qwen2-72B-Instruct
Text Generation
β’
Updated
Oct 8, 2024
β’
34.9k
β’
β’
715
openbmb/MiniCPM-V-2_6
Image-Text-to-Text
β’
Updated
Jan 15
β’
249k
β’
967
Running
630
630
Qwen2-VL-72B
π
Engage in multi-modal conversations with images and videos
Running
on
Zero
117
117
ColPali
π
Document Retrieval
vidore/colpali_train_set
Viewer
β’
Updated
Sep 4, 2024
β’
119k
β’
1.95k
β’
80
lmms-lab/llava-onevision-qwen2-7b-ov
Text Generation
β’
Updated
Sep 2, 2024
β’
350k
β’
49
HuggingFaceM4/Idefics3-8B-Llama3
Image-Text-to-Text
β’
Updated
Dec 2, 2024
β’
48.6k
β’
275
Upvote
27
+23
Share collection
View history
Collection guide
Browse collections