Improve model card: Add Hear-Your-Click context and refined metadata

#63
by nielsr HF Staff - opened

This PR updates the model card for openai/clip-vit-base-patch32. It clarifies that this CLIP model serves as a crucial component (visual encoder) within the "Hear-Your-Click: Interactive Object-Specific Video-to-Audio Generation" framework.

The changes include:

  • Retaining the detailed description of the openai/clip-vit-base-patch32 model.
  • Adding a new section that introduces "Hear-Your-Click", its abstract, a link to its paper (2507.04959), and its GitHub repository (https://github.com/SynapGrid/Hear-Your-Click-2024).
  • Updating metadata with license: mit, library_name: transformers, and confirming pipeline_tag: zero-shot-image-classification.
  • Adding additional tags like clip and video-to-audio for better discoverability and context.
  • Including the BibTeX citation for the "Hear-Your-Click" paper.

This update provides valuable context for users interested in the applications of this foundational CLIP model.

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment