Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
RichardForests 's Collections
Language Models
CV
RL
Diffusion models
3D/4D Gaussian Splatting
Multimodal
Mamba
NeRF
Transformers & MoE
(3D) Foundation Models
SSL
DL & Software DStructures
Gemma & MoE
Dora
Flash Attention in Triton
Lora variations
Parameter Efficient - LLMs
Robotics - Cross Attention
LLM Agents OS
DMs - Lighting Conditions

Multimodal

updated Feb 24, 2024
Upvote
-

  • Running on Zero
    MCP
    1.91k
    1.91k

    Stable Video Diffusion 1.1

    📺

    Generate a video from a single image


  • Generative Multimodal Models are In-Context Learners

    Paper • 2312.13286 • Published Dec 20, 2023 • 37

  • COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training

    Paper • 2401.00849 • Published Jan 1, 2024 • 17

  • TheBloke/Sonya-7B-GPTQ

    Text Generation • 1B • Updated Dec 31, 2023 • 21 • 2

  • Sleeping
    140
    140

    TextDiffuser 2

    📚

    Generate images from text prompts with layout planning


  • AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling

    Paper • 2402.12226 • Published Feb 19, 2024 • 45
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs