--- license: mit language: en tags: - stable-diffusion - transformer - bnb-4bit - quantization - nf4 - fp4 --- # RediDream NSFW I1 Quantized (bnb-4bit: NF4 & FP4) This is a quantized version of the original [RediDream NSFW I1 model](https://civitai.com/models/958009?modelVersionId=1719149), converted using the `bnb-4bit` quantization method. It includes two variants: - `nf4` (NormalFloat4) quantization - `fp4` (Float4) quantization Both are saved in the `safetensors` format and have smaller file sizes than the official GGUF-quantized ‘NF4’(in fact it's a Q4_K_M quantized model not NF4) version (9.6 GB vs 10.7 GB). The generation performance is lower than the gguf version (single-step time 1.5s vs 1.1s), and the effect is similar to Q4_0 quantization. ## Quantization Method The quantization is inspired by [`azaneko/HiDream-I1-Full-nf4`](https://huggingface.co/azaneko/HiDream-I1-Full-nf4), but it does **not** follow the standard `accelerate` + `bitsandbytes` pipeline. Instead, a custom low-level approach was used to perform **offline quantization** by: - Loading model layers one by one - Skipping non-quantizable layers (e.g., MOE gates, output heads) - Applying proper group size, `absmax`, and precision control This approach reduces memory usage during quantization to ~40 GB and is **entirely independent of the original model code or structure**. It allows quantization of large, complex transformer models with multiple tokenizers or text encoders, such as HiDream, which are otherwise difficult to quantize online due to their memory requirements. > ⚠️ Note: Scripts for this quantization method will be open-sourced at a later time. ## Performance - The quantized models perform **similarly** to the Q4_0 quantized version. - No formal benchmark or deep evaluation has been conducted. ## Files - `pytorch_model-nf4.safetensors` – 4-bit NF4 quantized model - `pytorch_model-fp4.safetensors` – 4-bit FP4 quantized model ## 🧩 ComfyUI Loading Nodes (silveroxides bnb‑4bit Loader) To load the 4‑bit quantized HiDream I1 model in ComfyUI using the custom loader, use the nodes provided by the [ComfyUI_bnb_nf4_fp4_Loaders](https://github.com/silveroxides/ComfyUI_bnb_nf4_fp4_Loaders) repository. 1. **QuantizedCheckpoint Loader** - **Repository**: https://github.com/silveroxides/ComfyUI_bnb_nf4_fp4_Loaders 2. **KSampleSetting** - **Skip layers**: **2–6** (to improve stability with quantized weights) - **Steps**: **15–25** - **Sampler**: **LCM** (more stable on 4‑bit quantized models) - **Schedule**: **normal** - **CFG scale**: **1.0** ## Quick Q&A **1. What’s the difference from the official NF4?** The official RediDream NF4 gguf version is actually a gguf quantized version of Q4_K_M, not a real bnb-nf4. So the official version is slightly larger than ours (10.7GB vs 9.6GB), and the effect is better. Our NF4 model needs about 25+ steps to reach the level of the official NF4, and each step is 30% slower on average. But our VRAM usage is also slightly smaller than the official one (11.6GB vs 13.3 GB). Overall, the level of our model is almost the same as the Q4_0 quantization. **2. How’s the performance?** The quantized model requires more conservative settings (e.g. skipping layers 2–6, 15–25 steps, CFG 1.0) to produce high-quality images, so generation may be a bit slower(single step 1.5s vs 1.1 s on RTX 4080). **3. How does it compare with GGUF’s Q4_0?** bnb‑4bit performs similarly to Q4_0, sometimes slightly weaker, but with a smaller footprint. The NF4 variant can even run faster on certain hardware (with proper software support). Additionally, ComfyUI’s GGUF loader appears to have memory-leak issues, whereas FP4/NF4 loading has not exhibited this problem. ## Postscript In the early days of diffusion models, releases were almost exclusively in `safetensors` format. The advent of HiDream—I1, an autoregressive Transformer-based image model—changed the landscape by making model sizes balloon far beyond what was typical for diffusion architectures. This in turn drove the adoption of GGUF quantization formats to keep those massive weights manageable. However, bnb‑4bit quantization offers a middle ground: it lets us avoid an explosion of GGUF “Qn” variants, many of which are difficult to evaluate (in practice, only Q4 or above tends to be truly useful, though even Q2 can sometimes suffice). Too many format choices can fragment the community and complicate maintenance. By standardizing around bnb‑4bit (with both NF4 and FP4 options), we simplify the ecosystem and make it easier for everyone to load, share, and update these ever‑growing models. ## Disclaimer This quantized model is intended for research and experimentation. Please verify performance and compatibility before use in production environments.