Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Posts
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
NothingLQH 's Collections
VLM
AutoGame
ORC
Code
Speech
Prompt
ImageToVideo
Story
TextToText
SpeechToText
NLP
Anime
3D
Video
LiveImage
IdeaMusic
DatasetLanguage
Vistral-7B-Chat
Image
TextToSpeech
LLM

VLM

updated 1 day ago
Upvote
-

  • FocusedAD: Character-centric Movie Audio Description

    Paper • 2504.12157 • Published about 1 month ago • 9

  • Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding

    Paper • 2504.10465 • Published Apr 14 • 28

  • PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding

    Paper • 2504.13180 • Published 29 days ago • 17

  • OS-Copilot/OS-Atlas-Base-7B

    Image-Text-to-Text • Updated Nov 19, 2024 • 2.54k • 38

  • google/siglip-so400m-patch14-384

    Zero-Shot Image Classification • Updated Sep 26, 2024 • 7.25M • 536
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs