File size: 6,463 Bytes
7932fb7
 
8229f1a
 
 
 
 
 
 
 
6881a32
 
 
 
 
 
 
 
 
 
 
7932fb7
6881a32
 
 
aac086c
9db8a12
6881a32
a79ef2b
6881a32
 
fcb2ae8
 
5648eb0
a79ef2b
5648eb0
a79ef2b
5648eb0
aac086c
 
 
7932fb7
8229f1a
 
6881a32
8229f1a
dff772e
 
7932fb7
6881a32
aac086c
6881a32
a971fa7
 
 
6881a32
 
 
a971fa7
6881a32
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7932fb7
6881a32
 
 
 
 
 
 
 
 
7932fb7
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
---
library_name: diffusers
base_model: Qwen/Qwen-Image
base_model_relation: quantized
quantized_by: AlekseyCalvin
license: apache-2.0
language:
- en
- zh
pipeline_tag: text-to-image
tags:
- nf4
- Abliterated
- Qwen2.5-VL7b-Abliterated
- instruct
- Diffusers
- Transformers
- uncensored
- text-to-image
- image-to-image
- image-generation
---
<p align="center">
    <img src="https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/qwen_image_logo.png" width="200"/>
<p>
  
# QWEN-IMAGE Model |nf4|+Abliterated Qwen2.5VL-7b
This repo contains a variant of QWEN's **[QWEN-IMAGE](https://huggingface.co/Qwen/Qwen-Image)**, the state-of-the-art generative model with extensive and (image/)text-to-image &/or instruction/control-editing capabilities. <br>

To make these cutting edge capabilities more accessible to those constrained to low-end consumer-grade hardware, **we've quantized the DiT (Diffusion Transformer) component of Qwen-Image to the 4-bit NF4 format** using the Bits&Bytes toolkit.<br>
This optimization was derived by us directly from the BF16 base model weights released on 08/04/2025, with no other mix-ins or modifications to the DiT component. <br>
*NOTE: Install `bitsandbytes` prior to inference.* <br>

**QWEN-IMAGE** is an open-weights customization-friendly frontier model released under the highly permissive Apache 2.0 license, welcoming unrestricted (within legal limits) commercial, experimental, artistic, academic, and other uses &/or modifications. <br>

To help highlight horizons of possibility broadened by the **QWEN-IMAGE** release, our quantization is bundled with an "Abliterated" (aka de-censored) finetune of [Qwen2.5-VL 7B Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct), QWEN-IMAGE model's sole conditioning encoder (of prompts, instructions, input images, controls, etc), as well as a powerful Vision-Language-Model in its own right. <br>

As such, our repo saddles a lean & prim NF4 DiT over the **[Qwen2.5-VL-7B-Abliterated-Caption-it](https://huggingface.co/prithivMLmods/Qwen2.5-VL-7B-Abliterated-Caption-it/tree/main)** by [Prithiv Sakthi](https://huggingface.co/prithivMLmods) (aka [prithivMLmods](https://github.com/prithivsakthiur)).
<p align="center">
    <img src="https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/merge3.jpg" width="1600"/>
<p>

# NOTICE: 
*Do not be alarmed by the file warning from the ClamAV automated checker.* <br>
*It is a clear false positive.* *In assessing one of the typical Diffusers-adapted Safetensors shards (model weights), the checker reads:*
``The following viruses have been found: Pickle.Malware.SysAccess.sys.STACK_GLOBAL.UNOFFICIAL`` <br> 
*However, a Safetensors by its sheer design can not contain suchlike inserts. You may confirm for yourself thru HF's built-in weight/index viewer. <br> 
So, to be sure, this repo does **not** contain any pickle checkpoints, or any other pickled data.* <br> 

# TEXT-TO-IMAGE PIPELINE EXAMPLE:
This repo is formatted for usage with Diffusers (0.35.0.dev0+) & Transformers libraries, vis-a-vis associated pipelines & model component classes, such as the defaults listed in `model_index.json` (in this repo's root folder). <br>
*Sourced/adapted from [the original base model repo](https://huggingface.co/Qwen/Qwen-Image) by QWEN.*
**EDIT: 
We've confronted some issues with using the below pipeline. Will update once reliable adjustments are confirmed.** <br>

```python
from diffusers import DiffusionPipeline
import torch
import bitsandbytes
model_name = "AlekseyCalvin/QwenImage_nf4"
# Load the pipeline
if torch.cuda.is_available():
    torch_dtype = torch.bfloat16
    device = "cuda"
else:
    torch_dtype = torch.float32
    device = "cpu"
pipe = DiffusionPipeline.from_pretrained(model_name, torch_dtype=torch_dtype)
pipe = pipe.to(device)
positive_magic = [
    "en": "Ultra HD, 4K, cinematic composition." # for english prompt,
    "zh": "超清,4K,电影级构图" # for chinese prompt,
]
# Generate image
prompt = '''A coffee shop entrance features a chalkboard sign reading "Qwen Coffee 😊 $2 per cup," with a neon light beside it displaying "通义千问". Next to it hangs a poster showing a beautiful Chinese woman, and beneath the poster is written "π≈3.1415926-53589793-23846264-33832795-02384197". Ultra HD, 4K, cinematic composition'''
negative_prompt = " "
# Generate with different aspect ratios
aspect_ratios = {
    "1:1": (1328, 1328),
    "16:9": (1664, 928),
    "9:16": (928, 1664),
    "4:3": (1472, 1140),
    "3:4": (1140, 1472)
}
width, height = aspect_ratios["16:9"]
image = pipe(
    prompt=prompt + positive_magic["en"],
    negative_prompt=negative_prompt,
    width=width,
    height=height,
    num_inference_steps=50,
    true_cfg_scale=4.0,
    generator=torch.Generator(device="cuda").manual_seed(42)
).images[0]
image.save("example.png")
```
<br>

# SHOWCASES FROM THE QWEN TEAM:
![](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/s1.jpg#center)
![](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/s3.jpg#center)
![](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/s2.jpg#center)

# MORE INFO:
- Check out the [Technical Report](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/Qwen_Image.pdf) for QWEN-IMAGE, released by the Qwen team! <br>
- Find source base model weights here at [huggingface](https://huggingface.co/Qwen/Qwen-Image) and at [Modelscope](https://modelscope.cn/models/Qwen/Qwen-Image).

## QWEN LINKS:
<p align="center">
          💜 <a href="https://chat.qwen.ai/"><b>Qwen Chat</b></a>&nbsp&nbsp | &nbsp&nbsp🤗 <a href="https://huggingface.co/Qwen/Qwen-Image">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp🤖 <a href="https://modelscope.cn/models/Qwen/Qwen-Image">ModelScope</a>&nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/Qwen_Image.pdf">Tech Report</a> &nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://qwenlm.github.io/blog/qwen-image/">Blog</a> &nbsp&nbsp 
<br>
🖥️ <a href="https://huggingface.co/spaces/Qwen/qwen-image">Demo</a>&nbsp&nbsp | &nbsp&nbsp💬 <a href="https://github.com/QwenLM/Qwen-Image/blob/main/assets/wechat.png">WeChat (微信)</a>&nbsp&nbsp | &nbsp&nbsp🫨 <a href="https://discord.gg/CV4E9rpNSD">Discord</a>&nbsp&nbsp
</p>

## QWEN-IMAGE TECHNICAL REPORT CITATION:
```bibtex
@article{qwen-image,
    title={Qwen-Image Technical Report}, 
    author={Qwen Team},
    journal={arXiv preprint},
    year={2025}
}
```