Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
poonyZ 's Collections
omni
T2I
agi
fancy
vlm eval
speech lm
vlm data
video LM
VLM
llm

vlm data

updated Jan 7
Upvote
-

  • MIT-10M: A Large Scale Parallel Corpus of Multilingual Image Translation

    Paper • 2412.07147 • Published Dec 10, 2024 • 5

    Note 值得注意


  • Grounding Descriptions in Images informs Zero-Shot Visual Recognition

    Paper • 2412.04429 • Published Dec 5, 2024

    Note 一般


  • Exploring Multi-Grained Concept Annotations for Multimodal Large Language Models

    Paper • 2412.05939 • Published Dec 8, 2024 • 16

    Note 值得注意


  • Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity Visual Descriptions

    Paper • 2412.08737 • Published Dec 11, 2024 • 54

    Note 值得注意


  • VisionArena: 230K Real World User-VLM Conversations with Preference Labels

    Paper • 2412.08687 • Published Dec 11, 2024 • 13

  • BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities

    Paper • 2412.07769 • Published Dec 10, 2024 • 30

    Note 一般


  • How to Synthesize Text Data without Model Collapse?

    Paper • 2412.14689 • Published Dec 19, 2024 • 53

    Note 值得关注


  • MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval

    Paper • 2412.14475 • Published Dec 19, 2024 • 55

  • Toward Robust Hyper-Detailed Image Captioning: A Multiagent Approach and Dual Evaluation Metrics for Factuality and Coverage

    Paper • 2412.15484 • Published Dec 20, 2024 • 15

    Note 值得关注

Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs