Add Postscript
Browse files
README.md
CHANGED
@@ -67,6 +67,9 @@ The quantized model requires more conservative settings (e.g. skipping layers 2
|
|
67 |
**3. How does it compare with GGUF’s Q4_K?**
|
68 |
bnb‑4bit performs similarly to Q4_K, sometimes slightly weaker, but with a smaller footprint. The NF4 variant can even run faster on certain hardware (with proper software support). Additionally, ComfyUI’s GGUF loader appears to have memory-leak issues, whereas FP4/NF4 loading has not exhibited this problem.
|
69 |
|
|
|
|
|
|
|
70 |
|
71 |
## Disclaimer
|
72 |
|
|
|
67 |
**3. How does it compare with GGUF’s Q4_K?**
|
68 |
bnb‑4bit performs similarly to Q4_K, sometimes slightly weaker, but with a smaller footprint. The NF4 variant can even run faster on certain hardware (with proper software support). Additionally, ComfyUI’s GGUF loader appears to have memory-leak issues, whereas FP4/NF4 loading has not exhibited this problem.
|
69 |
|
70 |
+
## Postscript
|
71 |
+
|
72 |
+
In the early days of diffusion models, releases were almost exclusively in `safetensors` format. The advent of HiDream—I1, an autoregressive Transformer-based image model—changed the landscape by making model sizes balloon far beyond what was typical for diffusion architectures. This in turn drove the adoption of GGUF quantization formats to keep those massive weights manageable. However, bnb‑4bit quantization offers a middle ground: it lets us avoid an explosion of GGUF “Qn” variants, many of which are difficult to evaluate (in practice, only Q4 or above tends to be truly useful, though even Q2 can sometimes suffice). Too many format choices can fragment the community and complicate maintenance. By standardizing around bnb‑4bit (with both NF4 and FP4 options), we simplify the ecosystem and make it easier for everyone to load, share, and update these ever‑growing models. ```
|
73 |
|
74 |
## Disclaimer
|
75 |
|