Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Andyrasika 's Collections
Agents
Embedding
Prompt-collection
computation
Fine-Tuning
Ankush Collection
RAG articles
multimodal
Time series
Audio
Reinforcement Learning
Transformers
Stable Diffusion
cool models
Synthetic Datasets

multimodal

updated about 23 hours ago

this collection is for multimodal papers

Upvote
1

  • Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity

    Paper • 2407.10387 • Published Jul 15, 2024 • 8

  • Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models

    Paper • 2411.04996 • Published Nov 7, 2024 • 52

  • Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

    Paper • 2501.04001 • Published Jan 7 • 47

  • Scaling RL to Long Videos

    Paper • 2507.07966 • Published 2 days ago • 105
Upvote
1
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs