Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
yamayou 's Collections
time series
Idea
LLM
Multimodal

Multimodal

updated Sep 22, 2024
Upvote
-

  • Chameleon: Mixed-Modal Early-Fusion Foundation Models

    Paper • 2405.09818 • Published May 16, 2024 • 131

  • Matryoshka Multimodal Models

    Paper • 2405.17430 • Published May 27, 2024 • 34

  • Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

    Paper • 2406.02430 • Published Jun 4, 2024 • 37

    Note (instruction+原稿の入力token)をTransformerでspeech tokenに変換した後、DMでより詳細な情報を肉付けする


  • An Image is Worth 32 Tokens for Reconstruction and Generation

    Paper • 2406.07550 • Published Jun 11, 2024 • 60

  • ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild

    Paper • 2407.04172 • Published Jul 4, 2024 • 27

  • Building and better understanding vision-language models: insights and future directions

    Paper • 2408.12637 • Published Aug 22, 2024 • 131

  • Seed-Music: A Unified Framework for High Quality and Controlled Music Generation

    Paper • 2409.09214 • Published Sep 13, 2024 • 55
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs