Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Skier8402 's Collections
Realtime-apps
Leaderboards
Quantization tools
3Dmodels
Reasoning-models
Embedding models
Swahili models
multimodal
Diffusion model tools
metrics
RAG-agents
Speech apps
Prompts
Interesting finds
Chat-agents
Datasets
LLM-transparency-tools
Data creation
Computer vision

multimodal

updated Apr 30
Upvote
-

  • gpt-omni/mini-omni2

    Any-to-Any β€’ Updated Oct 24, 2024 β€’ 119 β€’ 272

  • vikhyatk/moondream2

    Image-Text-to-Text β€’ Updated Apr 14 β€’ 350k β€’ 1.15k

  • Running
    36
    36

    Candle Moondream 2

    πŸ•―

    MoonDream 2 Vision Model on the Browser: Candle/Rust/WASM


  • Running
    431
    431

    moondream2

    πŸŒ”

    a tiny vision language model


  • Running on Zero
    1.97k
    1.97k

    Chat With Janus-Pro-7B

    🌍

    A unified multimodal understanding and generation model.


  • Running
    110
    110

    Phi 4 Multimodal

    πŸŒ–

    Interact with an AI by sending text, images, or audio


  • Running on Zero
    161
    161

    DocScope-R1

    πŸ“°

    long-context vision-language understanding.


  • Qwen/Qwen2.5-Omni-7B

    Any-to-Any β€’ Updated Apr 30 β€’ 235k β€’ 1.65k

  • Qwen/Qwen2.5-Omni-3B

    Any-to-Any β€’ Updated Apr 30 β€’ 80.6k β€’ 235
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs