calcuis
/

ace-gguf

Model card Files Files and versions Community

ace-gguf / README.md

calcuis's picture

Update README.md

540afce verified 2 days ago

|

history blame contribute delete

3.32 kB

	---
	license: apache-2.0
	base_model:
	- ACE-Step/ACE-Step-v1-3.5B
	pipeline_tag: text-to-audio
	tags:
	- gguf-node
	---
	## gguf quantized ace-step-v1-3.5b
	- base model from [ace-step](https://huggingface.co/ACE-Step)
	- full set gguf (model+encoder+vae) works right away

	### setup (once)
	- drag ace-step to > `./ComfyUI/models/diffusion_models`
	- drag umt5-base to > `./ComfyUI/models/text_encoders`
	- drag pig to > `./ComfyUI/models/vae`

	![screenshot](https://raw.githubusercontent.com/calcuis/comfy/master/ace.png)

	### workflow
	- drag json or demo audio below to browser for workflow

	\| Prompt \| Audio Sample \|
	\|--------\|---------------\|
	\|female singing pop music electronic beats fennec core<br/>`cute fennec girl`<br/>`massive fennec ears`<br/>`big fluffy tail`<br/>`long blonde wavy hair`<br/>`large blue eyes`<br/>`I love fennec girl`<br/> \| 🎧 ace-step<br><audio controls src="https://huggingface.co/calcuis/ace-gguf/resolve/main/samples%5Cace.flac"></audio> \|

	## review
	- note: as need to keep some key tensors (in f32 status) to make it works; file size might not decrease that much; but load faster than safetensors checkpoint in general (no last minute bottle neck problem)
	- rebuilding umt5-base tokenizer logic applied successfully (similar to umt5-xxl; credit should give to city96 and all other contributors whom work on solving that issue); upgrade your node to the latest version for umt5-base encoder support; hence, safetensors checkpoint is no longer needed (removed here; if you want it still, you could get it from [comfyui-org](https://huggingface.co/Comfy-Org/ACE-Step_ComfyUI_repackaged/tree/main/all_in_one))
	- get more umt5-base encoder [here](https://huggingface.co/chatpig/umt5-base-encoder-gguf/tree/main)

	---

	## bonus: fp8/16/32 scaled stable-audio-open-1.0 with gguf quantized t5_base encoder
	- base model from [stabilityai](https://huggingface.co/stabilityai/stable-audio-open-1.0)
	- note: this is a different model; don't mix it up; also powerful and lite weight
	- dry running

	### setup (once)
	- drag t5-base to > `./ComfyUI/models/text_encoders`
	- drag safetensors to > `./ComfyUI/models/checkpoints`
	- drag pig to > `./ComfyUI/models/vae`

	![screenshot](https://raw.githubusercontent.com/calcuis/comfy/master/sd-audio.png)

	\| Prompt \| Audio Sample \|
	\|--------\|---------------\|
	\|heaven church electronic dance music \| 🎧 stable-audio<br><audio controls src="https://huggingface.co/calcuis/ace-gguf/resolve/main/samples%5Csd.flac"></audio> \|

	## review
	- note: the safetensors checkpoint in this repo is an extracted version; only contains model and condition switch tensors (extremely lite weighted); no clip and vae inside; should use it along with separate clip (text encoder) and vae
	- opt to get fp8/16/32 scaled checkpoint with model and vae embedded [here](https://huggingface.co/convertor/sa1-fp8/tree/main)
	- get more t5-base encoder [here](https://huggingface.co/chatpig/t5-base-encoder-gguf/tree/main)

	### reference
	- comfyui from [comfyanonymous](https://github.com/comfyanonymous/ComfyUI)
	- pig architecture from [connector](https://huggingface.co/connector)
	- gguf-node ([pypi](https://pypi.org/project/gguf-node)\|[repo](https://github.com/calcuis/gguf)\|[pack](https://github.com/calcuis/gguf/releases))