ace-gguf / README.md
calcuis's picture
Update README.md
540afce verified
---
license: apache-2.0
base_model:
- ACE-Step/ACE-Step-v1-3.5B
pipeline_tag: text-to-audio
tags:
- gguf-node
---
## gguf quantized ace-step-v1-3.5b
- base model from [ace-step](https://huggingface.co/ACE-Step)
- full set gguf (model+encoder+vae) works right away
### **setup (once)**
- drag **ace-step** to > `./ComfyUI/models/diffusion_models`
- drag **umt5-base** to > `./ComfyUI/models/text_encoders`
- drag **pig** to > `./ComfyUI/models/vae`
![screenshot](https://raw.githubusercontent.com/calcuis/comfy/master/ace.png)
### workflow
- drag json or demo audio below to browser for workflow
| Prompt | Audio Sample |
|--------|---------------|
|**female singing pop music electronic beats fennec core**<br/>`cute fennec girl`<br/>`massive fennec ears`<br/>`big fluffy tail`<br/>`long blonde wavy hair`<br/>`large blue eyes`<br/>`I love fennec girl`<br/> | 🎧 **ace-step**<br><audio controls src="https://huggingface.co/calcuis/ace-gguf/resolve/main/samples%5Cace.flac"></audio> |
## review
- note: as need to keep some key tensors (in f32 status) to make it works; file size might not decrease that much; but load faster than safetensors checkpoint in general (no last minute bottle neck problem)
- rebuilding umt5-base tokenizer logic applied successfully (similar to umt5-xxl; credit should give to city96 and all other contributors whom work on solving that issue); upgrade your node to the latest version for umt5-base encoder support; hence, safetensors checkpoint is no longer needed (removed here; if you want it still, you could get it from [comfyui-org](https://huggingface.co/Comfy-Org/ACE-Step_ComfyUI_repackaged/tree/main/all_in_one))
- get more **umt5-base** encoder [here](https://huggingface.co/chatpig/umt5-base-encoder-gguf/tree/main)
---
## bonus: fp8/16/32 scaled stable-audio-open-1.0 with gguf quantized t5_base encoder
- base model from [stabilityai](https://huggingface.co/stabilityai/stable-audio-open-1.0)
- note: this is a different model; don't mix it up; also powerful and lite weight
- dry running
### **setup (once)**
- drag **t5-base** to > `./ComfyUI/models/text_encoders`
- drag **safetensors** to > `./ComfyUI/models/checkpoints`
- drag **pig** to > `./ComfyUI/models/vae`
![screenshot](https://raw.githubusercontent.com/calcuis/comfy/master/sd-audio.png)
| Prompt | Audio Sample |
|--------|---------------|
|**heaven church electronic dance music** | 🎧 **stable-audio**<br><audio controls src="https://huggingface.co/calcuis/ace-gguf/resolve/main/samples%5Csd.flac"></audio> |
## review
- note: the safetensors checkpoint in this repo is an extracted version; only contains model and condition switch tensors (extremely lite weighted); no clip and vae inside; should use it along with separate clip (text encoder) and vae
- opt to get fp8/16/32 scaled checkpoint with model and vae embedded [here](https://huggingface.co/convertor/sa1-fp8/tree/main)
- get more **t5-base** encoder [here](https://huggingface.co/chatpig/t5-base-encoder-gguf/tree/main)
### **reference**
- comfyui from [comfyanonymous](https://github.com/comfyanonymous/ComfyUI)
- pig architecture from [connector](https://huggingface.co/connector)
- gguf-node ([pypi](https://pypi.org/project/gguf-node)|[repo](https://github.com/calcuis/gguf)|[pack](https://github.com/calcuis/gguf/releases))