metadata
license: apache-2.0
base_model:
- ACE-Step/ACE-Step-v1-3.5B
pipeline_tag: text-to-audio
tags:
- gguf-node
gguf quantized ace-step-v1-3.5b
- base model from ace-step
- full set gguf (model+encoder+vae) works right away
setup (once)
- drag ace-step to >
./ComfyUI/models/diffusion_models
- drag umt5-base to >
./ComfyUI/models/text_encoders
- drag pig to >
./ComfyUI/models/vae
workflow
- drag json or demo audio below to browser for workflow
Prompt | Audio Sample |
---|---|
female singing pop music electronic beats fennec corecute fennec girl massive fennec ears big fluffy tail long blonde wavy hair large blue eyes I love fennec girl |
🎧 ace-step |
review
- note: as need to keep some key tensors (in f32 status) to make it works; file size might not decrease that much; but load faster than safetensors checkpoint in general (no last minute bottle neck problem)
- rebuilding umt5-base tokenizer logic applied successfully (similar to umt5-xxl; credit should give to city96 and all other contributors whom work on solving that issue); upgrade your node to the latest version for umt5-base encoder support; hence, safetensors checkpoint is no longer needed (removed here; if you want it still, you could get it from comfyui-org)
- get more umt5-base encoder here
bonus: fp8/16/32 scaled stable-audio-open-1.0 with gguf quantized t5_base encoder
- base model from stabilityai
- note: this is a different model; don't mix it up; also powerful and lite weight
- dry running
setup (once)
- drag t5-base to >
./ComfyUI/models/text_encoders
- drag safetensors to >
./ComfyUI/models/checkpoints
- drag pig to >
./ComfyUI/models/vae
Prompt | Audio Sample |
---|---|
heaven church electronic dance music | 🎧 stable-audio |
review
- note: the safetensors checkpoint in this repo is an extracted version; only contains model and condition switch tensors (extremely lite weighted); no clip and vae inside; should use it along with separate clip (text encoder) and vae
- opt to get fp8/16/32 scaled checkpoint with model and vae embedded here
- get more t5-base encoder here
reference
- comfyui from comfyanonymous
- pig architecture from connector
- gguf-node (pypi|repo|pack)