Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Posts
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Deping 's Collections
LLM_VLM_R1
LLM_Infra
Video_MLLMS
VisionExpertModels
LLMs
VLMS
VideoEncoder
VLM_Datasets
GeneralDetector
MM_Datasets

VLMS

updated Sep 22, 2024
Upvote
1

  • PsiPi/liuhaotian_llava-v1.5-13b-GGUF

    Image-Text-to-Text • Updated Mar 11, 2024 • 1.05k • 36

  • TRI-ML/prismatic-vlms

    Image-to-Text • Updated May 6, 2024 • 20

  • bczhou/tiny-llava-v1-hf

    Image-Text-to-Text • Updated Aug 17, 2024 • 2.36k • 57

  • ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling

    Paper • 2402.06118 • Published Feb 9, 2024 • 15

  • LEGO:Language Enhanced Multi-modal Grounding Model

    Paper • 2401.06071 • Published Jan 11, 2024 • 13

  • Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

    Paper • 2403.18814 • Published Mar 27, 2024 • 48

  • Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models

    Paper • 2403.16999 • Published Mar 25, 2024 • 4

  • Salesforce/instructblip-vicuna-7b

    Image-Text-to-Text • Updated Feb 3 • 25.3k • 92

  • Pegasus-v1 Technical Report

    Paper • 2404.14687 • Published Apr 23, 2024 • 33

  • List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs

    Paper • 2404.16375 • Published Apr 25, 2024 • 18

  • Needle In A Multimodal Haystack

    Paper • 2406.07230 • Published Jun 11, 2024 • 55
Upvote
1
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs