Vision - a diwank Collection

Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

diwank 's Collections

Med

code

F

search

Vision

Art

K

S1.1

Sam

Audio

thought

Vision

updated 12 days ago

apple/DepthPro

Depth Estimation • Updated Feb 28 • 1.76k • 432
rhymes-ai/Aria

Image-Text-to-Text • Updated Apr 23 • 19.1k • 629
mit-han-lab/hart-0.7b-1024px

Unconditional Image Generation • Updated Nov 17, 2024 • 13
deepseek-ai/Janus-1.3B

Any-to-Any • Updated Jan 27 • 6.05k • 588
neulab/PangeaInstruct

Updated Feb 2 • 900 • 83
genmo/mochi-1-preview

Text-to-Video • Updated Dec 18, 2024 • 33.6k • • 1.22k
stabilityai/stable-diffusion-3.5-large

Text-to-Image • Updated Oct 22, 2024 • 117k • • 2.81k
Freepik/flux.1-lite-8B-alpha

Text-to-Image • Updated Dec 30, 2024 • 1.76k • 419
microsoft/OmniParser

Image-Text-to-Text • Updated Dec 2, 2024 • 592 • 1.66k
mistralai/Pixtral-12B-Base-2409

Updated Feb 2 • 102
neulab/Pangea-7B

Updated Oct 24, 2024 • 12.2k • 129
jadechoghari/Ferret-UI-Llama8b

Image-Text-to-Text • Updated Jan 8 • 107 • 69
OpenGVLab/InternVL2-1B

Image-Text-to-Text • Updated Mar 25 • 36.6k • 71
OpenGVLab/InternVL2-2B

Image-Text-to-Text • Updated Mar 25 • 321k • 69
OpenGVLab/Mono-InternVL-2B

Image-Text-to-Text • Updated Mar 12 • 9.01k • 33
OpenGVLab/OmniCorpus-YT

Updated Mar 20 • 335 • 13
OpenGVLab/OmniCorpus-CC-210M

Viewer • Updated Mar 20 • 208M • 619 • 24
OpenGVLab/OmniCorpus-CC

Viewer • Updated Mar 20 • 872M • 15.2k • 17
OpenGVLab/InternVideo2_chat_8B_HD

Video-Text-to-Text • Updated Dec 18, 2024 • 1.55k • 17
OpenGVLab/ViCLIP

Updated Jun 7, 2024 • 40
OpenGVLab/ASMv2

Text Generation • Updated Feb 29, 2024 • 265 • 16
OpenGVLab/VideoChat2-IT

Viewer • Updated Jun 29, 2024 • 1.82M • 335 • 50
NimVideo/cogvideox-2b-img2vid

Image-to-Video • Updated Oct 28, 2024 • 179 • 79
BAAI/Infinity-MM

Updated Dec 13, 2024 • 20.3k • 101
nvidia/RADIO-H

Updated Apr 17 • 1.82k • 10
Spawning/PD12M

Viewer • Updated Jan 9 • 12.4M • 3.04k • 157
Shitao/OmniGen-v1

Text-to-Image • Updated Nov 7, 2024 • 4.7k • 313
InstantX/InstantIR

Image-to-Image • Updated Nov 7, 2024 • 172
nvidia/Cosmos-0.1-Tokenizer-DI8x8

Updated Dec 25, 2024 • 780 • 11
BAAI/Emu3-Chat

Text Generation • Updated Oct 24, 2024 • 1.68k • 71
briaai/RMBG-2.0

Image Segmentation • Updated 14 days ago • 65.2k • 774
Watermark Anything with Localized Messages

Paper • 2411.07231 • Published Nov 11, 2024 • 22
rain1011/pyramid-flow-miniflux

Text-to-Video • Updated Nov 13, 2024 • 174
OpenGVLab/InternVL2-8B-MPO

Image-Text-to-Text • Updated Dec 20, 2024 • 221 • 34
mistralai/Pixtral-Large-Instruct-2411

Image-Text-to-Text • Updated Mar 16 • 413
briaai/BRIA-2.3

Text-to-Image • Updated Apr 10 • 736 • 37
microsoft/Reducio-VAE

Updated Nov 21, 2024 • 4 • 15
Lightricks/LTX-Video

Text-to-Video • Updated 6 days ago • 329k • • 1.59k
apple/aimv2-3B-patch14-448

Image Feature Extraction • Updated Feb 28 • 677 • 12
THUdyh/Insight-V-Reason

Text Generation • Updated Nov 22, 2024 • 17 • 9
black-forest-labs/FLUX.1-Fill-dev

Updated Nov 25, 2024 • 359k • 718
Efficient-Large-Model/Sana_1600M_512px

Text-to-Image • Updated Jan 10 • 1.84k • 39
Efficient-Large-Model/Sana_1600M_1024px

Text-to-Image • Updated Jan 10 • 4.27k • • 206
AIDC-AI/Ovis1.6-Gemma2-27B

Image-Text-to-Text • Updated Feb 26 • 69 • 62
HuggingFaceTB/SmolVLM-Base

Image-Text-to-Text • Updated Nov 28, 2024 • 5.51k • 76
THUDM/glm-edge-v-5b

Image-Text-to-Text • Updated Jan 2 • 152 • 12
rhymes-ai/Aria-Base-64K

Image-Text-to-Text • Updated Dec 1, 2024 • 48 • 14
allenai/pixmo-point-explanations

Viewer • Updated Dec 5, 2024 • 79.6k • 161 • 7
tencent/HunyuanVideo

Text-to-Video • Updated Mar 6 • 1.98k • • 1.88k
tencent/HunyuanVideo-PromptRewrite

Updated Dec 6, 2024 • 34 • 49
google/paligemma2-28b-pt-896

Image-Text-to-Text • Updated Dec 5, 2024 • 247 • 48
OpenGVLab/InternVL2_5-78B

Image-Text-to-Text • Updated Mar 25 • 28.5k • 190
MAmmoTH-VL/MAmmoTH-VL-8B

Updated Dec 9, 2024 • 12 • 18
MAmmoTH-VL/MAmmoTH-VL-Instruct-12M

Viewer • Updated Jan 5 • 37M • 3.83k • 51
OpenGVLab/PVC-InternVL2-8B

Image-Text-to-Text • Updated Dec 17, 2024 • 12 • 8
BGLab/BioTrove

Viewer • Updated Dec 13, 2024 • 163M • 1.11k • 14
TencentARC/NVComposer

Image-to-3D • Updated Dec 16, 2024 • 38 • 7
deepseek-ai/deepseek-vl2

Image-Text-to-Text • Updated Dec 18, 2024 • 9.48k • 331
FastVideo/FastHunyuan

Text-to-Video • Updated Jan 8 • 36 • 186
BAAI/nova-d48w1536-sdxl1024

Text-to-Image • Updated Dec 21, 2024 • 8 • 7
IamCreateAI/Ruyi-Mini-7B

Image-to-Video • Updated Dec 25, 2024 • 296 • 610
Infinigence/Megrez-3B-Omni

Updated Feb 14 • 26 • 132
microsoft/VidTok

Updated Apr 5 • 41
TIGER-Lab/Mantis-8B-siglip-llama3

Image-Text-to-Text • Updated Nov 15, 2024 • 490 • 33
OpenGVLab/HoVLE-HD

Image-Text-to-Text • Updated Feb 9 • 36 • 8
nyu-visionx/cambrian-34b

Text Generation • Updated Jun 28, 2024 • 98 • 28
nyu-visionx/cambrian-phi3-3b

Text Generation • Updated Jul 6, 2024 • 212 • 11
nyu-visionx/Cambrian-Alignment

Viewer • Updated Jul 23, 2024 • 292k • 5.17k • 33
nvidia/Cosmos-1.0-Autoregressive-13B-Video2World

Updated Feb 8 • 70 • 31
nvidia/Cosmos-1.0-Diffusion-14B-Video2World

Updated 19 days ago • 2.62k • 56
nvidia/Cosmos-1.0-Diffusion-14B-Text2World

Updated 19 days ago • 2.81k • 59
nvidia/Cosmos-1.0-Autoregressive-12B

Updated Feb 11 • 79 • 30
StephanST/WALDO30

Object Detection • Updated Oct 9, 2024 • 235
ByteDance/Sa2VA-8B

Image-Text-to-Text • Updated Mar 19 • 1.11k • 56
OpenGVLab/VideoChat-Flash-Qwen2_5-2B_res448

Video-Text-to-Text • Updated Mar 16 • 1.63k • 19
OpenGVLab/VideoMAEv2-giant

Video Classification • Updated Feb 25 • 2.38k • 4
MiniMaxAI/MiniMax-VL-01

Image-Text-to-Text • Updated 14 days ago • 23.4k • 258
NimVideo/mochi-1-transformer-42

Text-to-Video • Updated Jan 13 • 24 • 3
ostris/Flex.1-alpha

Text-to-Image • Updated Jan 19 • 18.4k • 451
tencent/Hunyuan3D-2

Image-to-3D • Updated Apr 10 • 342k • 1.46k
deepseek-ai/Janus-Pro-1B

Any-to-Any • Updated Feb 1 • 24.7k • 440
deepseek-ai/Janus-Pro-7B

Any-to-Any • Updated Feb 1 • 92.6k • 3.39k
Qwen/Qwen2.5-VL-72B-Instruct

Image-Text-to-Text • Updated Mar 23 • 175k • • 465
nvidia/Eagle2-9B

Image-Text-to-Text • Updated Jan 28 • 3.79k • 57
m-a-p/PIN-100M

Viewer • Updated 4 days ago • 68.1k • 52.5k • 11
AIDC-AI/Ovis2-34B

Image-Text-to-Text • Updated Feb 27 • 747 • 148
microsoft/OmniParser-v2.0

Updated Mar 28 • 1.02k • 1.25k
Alpha-VLLM/Lumina-Image-2.0

Text-to-Image • Updated Mar 30 • 2.69k • • 316
prithivMLmods/JSONify-Flux

Image-Text-to-Text • Updated Feb 16 • 24 • 3
Skywork/SkyReels-V1-Hunyuan-I2V

Image-to-Video • Updated Feb 24 • 824 • 269
Skywork/SkyReels-A1

Image-to-Video • Updated Mar 4 • 229 • 60
AIDC-AI/Ovis2-16B

Image-Text-to-Text • Updated Feb 27 • 19.8k • 92
curateIT/themet_openaccess_bestof

Viewer • Updated Apr 7, 2024 • 1.77k • 11 • 1
MnLgt/yolo-human-parse

Image Classification • Updated Sep 19, 2024 • 12 • 6
google/paligemma2-3b-mix-448

Image-Text-to-Text • Updated Feb 7 • 7.33k • 44
google/paligemma2-28b-mix-448

Image-Text-to-Text • Updated Feb 7 • 248 • 26
HuggingFaceTB/SmolVLM2-2.2B-Instruct

Image-Text-to-Text • Updated Apr 8 • 82.3k • 191
Wan-AI/Wan2.1-T2V-14B

Text-to-Video • Updated Mar 12 • 38k • • 1.27k
allenai/olmOCR-7B-0225-preview

Image-Text-to-Text • Updated Feb 25 • 408k • 651
microsoft/Phi-4-multimodal-instruct

Automatic Speech Recognition • Updated 24 days ago • 387k • 1.4k
briaai/BRIA-4B-Adapt

Text-to-Image • Updated Mar 7 • 184 • 7
DAMO-NLP-SG/VideoLLaMA3-7B

Visual Question Answering • Updated Mar 20 • 71.3k • 54
ali-vilab/ACE_Plus

Updated Mar 14 • 102 • 243
ByteDance/LatentSync-1.5

Updated Mar 16 • 63
IDEA-Research/RexSeek-3B

Image-Text-to-Text • Updated Mar 14 • 506 • 8
TIGER-Lab/Vamba-Qwen2-VL-7B

Video-Text-to-Text • Updated Mar 18 • 133 • 16
ds4sd/SmolDocling-256M-preview

Image-Text-to-Text • Updated 9 days ago • 283k • 1.39k
nvidia/Cosmos-Predict1-14B-Video2World

Updated Apr 8 • 316 • 4
nvidia/Cosmos-Transfer1-7B

Updated Apr 8 • 3.47k • 36
CohereLabs/aya-vision-32b

Image-Text-to-Text • Updated 12 days ago • 496 • • 204
ByteDance/Sa2VA-26B

Image-Text-to-Text • Updated Mar 19 • 243 • 25
ChaolongYang/KDTalker

Image-to-Video • Updated Mar 30 • 13
Rapidata/OpenAI-4o_t2i_human_preference

Viewer • Updated Mar 28 • 13k • 259 • 34
McGill-NLP/AURORA

Image-to-Image • Updated Dec 21, 2024 • 302 • 4
HiDream-ai/MotionPro

Image-to-Video • Updated 6 days ago • 76
RaphaelLiu/Pusa-V0.5

Updated Apr 15 • 120 • 43
OpenGVLab/InternVL3-38B

Image-Text-to-Text • Updated about 1 month ago • 101k • 29
ShoufaChen/PixelFlow-Text2Image

Text-to-Image • Updated Apr 12 • 13
FoundationVision/Infinity

Updated Feb 18 • 70 • 49
nvidia/PhysicalAI-SmartSpaces

Updated 2 days ago • 20.4k • 27
nvidia/DAM-3B-Video

Image-Text-to-Text • Updated 18 days ago • 15.8k • 52
nvidia/DAM-3B-Self-Contained

Image-Text-to-Text • Updated 18 days ago • 5.16k • 21
OpenGVLab/VideoChat-R1_7B

Video-Text-to-Text • Updated Apr 22 • 5.35k • 7
Skywork/SkyCaptioner-V1

Video-Text-to-Text • Updated about 1 month ago • 683 • 37
Fintor/Fintor-GUI-S2

Image-Text-to-Text • Updated Apr 24 • 158 • 4
ByteDance-Seed/UI-TARS-7B-DPO

Image-Text-to-Text • Updated Jan 25 • 127k • 213
OpenGVLab/InternVL_2_5_HiCo_R64

Video-Text-to-Text • Updated 13 days ago • 167 • 3

Collection guide
Browse collections

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs