Stable Diffusion 1.5 Latent Consistency Model for RKNN2
Run the Stable Diffusion 1.5 LCM image generation model using RKNPU2!
- Inference speed (RK3588, single NPU core):
- 384x384: Text encoder 0.05s + U-Net 2.36s/it + VAE Decoder 5.48s
- 512x512: Text encoder 0.05s + U-Net 5.65s/it + VAE Decoder 11.13s
- Memory usage:
- 384x384: About 5.2GB
- 512x512: About 5.6GB
Usage
1. Clone or download this repository to your local machine
2. Install dependencies
pip install diffusers pillow numpy<2 rknn-toolkit-lite2
3. Run
python ./run_rknn-lcm.py -i ./model -o ./images --num-inference-steps 4 -s 512x512 --prompt "Majestic mountain landscape with snow-capped peaks, autumn foliage in vibrant reds and oranges, a turquoise river winding through a valley, crisp and serene atmosphere, ultra-realistic style."
Model Conversion
Install dependencies
pip install diffusers pillow numpy<2 rknn-toolkit2
1. Download the model
Download a Stable Diffusion 1.5 LCM model in ONNX format and place it in the ./model
directory.
huggingface-cli download TheyCallMeHex/LCM-Dreamshaper-V7-ONNX
cp -r -L ~/.cache/huggingface/hub/models--TheyCallMeHex--LCM-Dreamshaper-V7-ONNX/snapshots/4029a217f9cdc0437f395738d3ab686bb910ceea ./model
In theory, you could also achieve LCM inference by merging the LCM Lora into a regular Stable Diffusion 1.5 model and then converting it to ONNX format. However, I'm not sure how to do this. If anyone knows, please feel free to submit a PR.
2. Convert the model
# Convert the model, 384x384 resolution
python ./convert-onnx-to-rknn.py -m ./model -r 384x384
Note that the higher the resolution, the larger the model and the longer the conversion time. It's not recommended to use very high resolutions.
Known Issues
As of now, models converted using the latest version of rknn-toolkit2 (version 2.2.0) still suffer from severe precision loss, even when using fp16 data type. As shown in the image, the top is the result of inference using the ONNX model, and the bottom is the result using the RKNN model. All parameters are the same. Moreover, the higher the resolution, the more severe the precision loss. This is a bug in rknn-toolkit2.(Fixed in v2.3.0)Actually, the model conversion script can select multiple resolutions (e.g., "384x384,256x256"), but this causes the model conversion to fail. This is a bug in rknn-toolkit2.
References
Model tree for thanhtantran/Stable-Diffusion-1.5-LCM-ONNX-RKNN2
Base model
TheyCallMeHex/LCM-Dreamshaper-V7-ONNX