Update README.md
Browse files
README.md
CHANGED
@@ -7,23 +7,107 @@ license: apache-2.0
|
|
7 |
language:
|
8 |
- en
|
9 |
- zh
|
10 |
-
- nf4
|
11 |
pipeline_tag: text-to-image
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
|
14 |
-
|
15 |
-
This repo is a quantization of the [Qwen-Image model by Qwen](https://huggingface.co/Qwen/Qwen-Image). <br>
|
16 |
-
**DiT (Diffusion Transformer) quantized to NF4 usings BitsAndBytes**
|
17 |
-
All other components in BF16.
|
18 |
|
19 |
# NOTICE:
|
20 |
*Do not be alarmed by the file warning from the ClamAV automated checker.* <br>
|
21 |
-
*It is a clear false positive.*
|
22 |
-
*In assessing one of the typical Diffusers-adapted Safetensors shards (model weights), the checker reads:*
|
23 |
``The following viruses have been found: Pickle.Malware.SysAccess.sys.STACK_GLOBAL.UNOFFICIAL`` <br>
|
24 |
*However, a Safetensors can not contain suchlike inserts. <br>
|
25 |
-
You may confirm for yourself through HF's built-in utility weight
|
26 |
To be sure, this repo does **not** contain any pickle checkpoints, or any other pickled data.* <br>
|
27 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
28 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
29 |
|
|
|
7 |
language:
|
8 |
- en
|
9 |
- zh
|
|
|
10 |
pipeline_tag: text-to-image
|
11 |
+
tags:
|
12 |
+
- nf4
|
13 |
+
- Abliterated
|
14 |
+
- Qwen2.5-VL7b-Abliterated
|
15 |
+
- instruct
|
16 |
+
- Diffusers
|
17 |
+
- Transformers
|
18 |
+
- uncensored
|
19 |
+
- text-to-image
|
20 |
+
- image-to-image
|
21 |
+
- image-generation
|
22 |
---
|
23 |
+
<p align="center">
|
24 |
+
<img src="https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/qwen_image_logo.png" width="200"/>
|
25 |
+
<p>
|
26 |
+
# QWEN-IMAGE MODEL (NF4) w/Abliterated Qwen2.5-VL-7B
|
27 |
+
This repo contains a variant of QWEN's **[QWEN-IMAGE](https://huggingface.co/Qwen/Qwen-Image)**, the state-of-the-art generative model with extensive and (image/)text-to-image &/or instruction/control-editing capabilities. <br>
|
28 |
+
To make these cutting edge capabilities more accessible to those constrained to low-end consumer-grade hardware, **we've quantized the DiT (Diffusion Transformer) component of Qwen-Image to the 4-bit NF4 format** using the Bits&Bytes toolkit.<br>
|
29 |
+
This optimization was derived by us directly from the BF16 base model weights released on 08/04/2025, with no other mix-ins or modifications to the DiT component. <br>
|
30 |
+
QWEN-IMAGE is an open-weights customization-friendly frontier model released under the highly permissive Apache 2.0 license, welcoming unrestricted commercial use &/or modification. <br>
|
31 |
+
To further highlight the horizons of possibility broadened by the release of QWEN-IMAGE, our quantization of it is bundled with an "Abliterated" (aka de-censored) finetune of [Qwen2.5-VL 7B Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct), QWEN-IMAGE model's sole conditioning encoder (of prompts, instructions, input images, controls, etc), as well as a powerful Vision-Language-Model in its own right. <br>
|
32 |
+
As such, our repo saddles a lean & prim NF4 DiT over the [Qwen2.5-VL-7B-Abliterated-Caption-it](https://huggingface.co/prithivMLmods/Qwen2.5-VL-7B-Abliterated-Caption-it/tree/main) by the [Prithiv Sakthi](https://huggingface.co/prithivMLmods) (aka [prithivMLmods](https://github.com/prithivsakthiur)).
|
33 |
|
34 |
+
This repo is formatted for usage with Diffusers (0.35.0.dev0+) & Transformers libraries, vis-a-vis associated pipelines & model component classes, such as the defaults listed in `model_index.json` (in this repo's root folder). <br>
|
|
|
|
|
|
|
35 |
|
36 |
# NOTICE:
|
37 |
*Do not be alarmed by the file warning from the ClamAV automated checker.* <br>
|
38 |
+
*It is a clear false positive.* *In assessing one of the typical Diffusers-adapted Safetensors shards (model weights), the checker reads:*
|
|
|
39 |
``The following viruses have been found: Pickle.Malware.SysAccess.sys.STACK_GLOBAL.UNOFFICIAL`` <br>
|
40 |
*However, a Safetensors can not contain suchlike inserts. <br>
|
41 |
+
You may confirm for yourself through HF's built-in utility weight/tensor index =viewer. <br>
|
42 |
To be sure, this repo does **not** contain any pickle checkpoints, or any other pickled data.* <br>
|
43 |
|
44 |
+
# TEXT-TO-IMAGE PIPELINE EXAMPLE:
|
45 |
+
*Sourced/adapted from [the original base model repo](https://huggingface.co/Qwen/Qwen-Image) by QWEN.*
|
46 |
+
```python
|
47 |
+
from diffusers import DiffusionPipeline
|
48 |
+
import torch
|
49 |
+
model_name = "AlekseyCalvin/QwenImage_nf4"
|
50 |
+
# Load the pipeline
|
51 |
+
if torch.cuda.is_available():
|
52 |
+
torch_dtype = torch.bfloat16
|
53 |
+
device = "cuda"
|
54 |
+
else:
|
55 |
+
torch_dtype = torch.float32
|
56 |
+
device = "cpu"
|
57 |
+
pipe = DiffusionPipeline.from_pretrained(model_name, torch_dtype=torch_dtype)
|
58 |
+
pipe = pipe.to(device)
|
59 |
+
positive_magic = [
|
60 |
+
"en": "Ultra HD, 4K, cinematic composition." # for english prompt,
|
61 |
+
"zh": "超清,4K,电影级构图" # for chinese prompt,
|
62 |
+
]
|
63 |
+
# Generate image
|
64 |
+
prompt = '''A coffee shop entrance features a chalkboard sign reading "Qwen Coffee 😊 $2 per cup," with a neon light beside it displaying "通义千问". Next to it hangs a poster showing a beautiful Chinese woman, and beneath the poster is written "π≈3.1415926-53589793-23846264-33832795-02384197". Ultra HD, 4K, cinematic composition'''
|
65 |
+
negative_prompt = " "
|
66 |
+
# Generate with different aspect ratios
|
67 |
+
aspect_ratios = {
|
68 |
+
"1:1": (1328, 1328),
|
69 |
+
"16:9": (1664, 928),
|
70 |
+
"9:16": (928, 1664),
|
71 |
+
"4:3": (1472, 1140),
|
72 |
+
"3:4": (1140, 1472)
|
73 |
+
}
|
74 |
+
width, height = aspect_ratios["16:9"]
|
75 |
+
image = pipe(
|
76 |
+
prompt=prompt + positive_magic["en"],
|
77 |
+
negative_prompt=negative_prompt,
|
78 |
+
width=width,
|
79 |
+
height=height,
|
80 |
+
num_inference_steps=50,
|
81 |
+
true_cfg_scale=4.0,
|
82 |
+
generator=torch.Generator(device="cuda").manual_seed(42)
|
83 |
+
).images[0]
|
84 |
+
image.save("example.png")
|
85 |
+
```
|
86 |
+
<br>
|
87 |
+
|
88 |
+
# SHOWCASES FROM THE QWEN TEAM:
|
89 |
+

|
90 |
+

|
91 |
+

|
92 |
+
|
93 |
+
# MORE INFO:
|
94 |
+
- Check out the [Technical Report](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/Qwen_Image.pdf) for QWEN-IMAGE, released by the Qwen team! <br>
|
95 |
+
- Find source base model weights here at [huggingface](https://huggingface.co/Qwen/Qwen-Image) and at [Modelscope](https://modelscope.cn/models/Qwen/Qwen-Image).
|
96 |
+
|
97 |
+
## QWEN LINKS:
|
98 |
+
<p align="center">
|
99 |
+
💜 <a href="https://chat.qwen.ai/"><b>Qwen Chat</b></a>   |   🤗 <a href="https://huggingface.co/Qwen/Qwen-Image">Hugging Face</a>   |   🤖 <a href="https://modelscope.cn/models/Qwen/Qwen-Image">ModelScope</a>   |    📑 <a href="https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/Qwen_Image.pdf">Tech Report</a>    |    📑 <a href="https://qwenlm.github.io/blog/qwen-image/">Blog</a>   
|
100 |
+
<br>
|
101 |
+
🖥️ <a href="https://huggingface.co/spaces/Qwen/qwen-image">Demo</a>   |   💬 <a href="https://github.com/QwenLM/Qwen-Image/blob/main/assets/wechat.png">WeChat (微信)</a>   |   🫨 <a href="https://discord.gg/CV4E9rpNSD">Discord</a>  
|
102 |
+
</p>
|
103 |
|
104 |
+
## QWEN-IMAGE TECHNICAL REPORT CITATION:
|
105 |
+
```bibtex
|
106 |
+
@article{qwen-image,
|
107 |
+
title={Qwen-Image Technical Report},
|
108 |
+
author={Qwen Team},
|
109 |
+
journal={arXiv preprint},
|
110 |
+
year={2025}
|
111 |
+
}
|
112 |
+
```
|
113 |
|