ace-gguf / README.md
calcuis's picture
Update README.md
540afce verified
metadata
license: apache-2.0
base_model:
  - ACE-Step/ACE-Step-v1-3.5B
pipeline_tag: text-to-audio
tags:
  - gguf-node

gguf quantized ace-step-v1-3.5b

  • base model from ace-step
  • full set gguf (model+encoder+vae) works right away

setup (once)

  • drag ace-step to > ./ComfyUI/models/diffusion_models
  • drag umt5-base to > ./ComfyUI/models/text_encoders
  • drag pig to > ./ComfyUI/models/vae

screenshot

workflow

  • drag json or demo audio below to browser for workflow
Prompt Audio Sample
female singing pop music electronic beats fennec core
cute fennec girl
massive fennec ears
big fluffy tail
long blonde wavy hair
large blue eyes
I love fennec girl
🎧 ace-step

review

  • note: as need to keep some key tensors (in f32 status) to make it works; file size might not decrease that much; but load faster than safetensors checkpoint in general (no last minute bottle neck problem)
  • rebuilding umt5-base tokenizer logic applied successfully (similar to umt5-xxl; credit should give to city96 and all other contributors whom work on solving that issue); upgrade your node to the latest version for umt5-base encoder support; hence, safetensors checkpoint is no longer needed (removed here; if you want it still, you could get it from comfyui-org)
  • get more umt5-base encoder here

bonus: fp8/16/32 scaled stable-audio-open-1.0 with gguf quantized t5_base encoder

  • base model from stabilityai
  • note: this is a different model; don't mix it up; also powerful and lite weight
  • dry running

setup (once)

  • drag t5-base to > ./ComfyUI/models/text_encoders
  • drag safetensors to > ./ComfyUI/models/checkpoints
  • drag pig to > ./ComfyUI/models/vae

screenshot

Prompt Audio Sample
heaven church electronic dance music 🎧 stable-audio

review

  • note: the safetensors checkpoint in this repo is an extracted version; only contains model and condition switch tensors (extremely lite weighted); no clip and vae inside; should use it along with separate clip (text encoder) and vae
  • opt to get fp8/16/32 scaled checkpoint with model and vae embedded here
  • get more t5-base encoder here

reference