littlebird13 naykun commited on
Commit
b2ddf47
·
verified ·
1 Parent(s): 2dab4d9

update model card (#1)

Browse files

- update model card (9bbb8fdd50de6233389bd2f26a45c7dc9940c182)


Co-authored-by: Kun Yan <[email protected]>

Files changed (1) hide show
  1. README.md +134 -3
README.md CHANGED
@@ -1,3 +1,134 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <p align="center">
2
+ <img src="https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/qwen_image_logo.png" width="400"/>
3
+ <p>
4
+ <p align="center">
5
+ 💜 <a href="https://chat.qwen.ai/"><b>Qwen Chat</b></a>&nbsp&nbsp | &nbsp&nbsp🤗 <a href="https://huggingface.co/Qwen">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp🤖 <a href="https://modelscope.cn/organization/qwen">ModelScope</a>&nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://arxiv.org/abs/xxx">Arxiv</a> &nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://qwenlm.github.io/blog/qwen-image/">Blog</a> &nbsp&nbsp
6
+ <br>
7
+ 🖥️ <a href="https://huggingface.co/spaces/Qwen/Qwen-Image-Demo">Demo</a>&nbsp&nbsp | &nbsp&nbsp💬 <a href="https://github.com/QwenLM/Qwen/blob/main/assets/wechat.png">WeChat (微信)</a>&nbsp&nbsp | &nbsp&nbsp🫨 <a href="https://discord.gg/CV4E9rpNSD">Discord</a>&nbsp&nbsp
8
+ </p>
9
+
10
+ <p align="center">
11
+ <img src="https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/meet_en.png#center" width="800"/>
12
+ <p>
13
+
14
+ ## Introduction
15
+ We are thrilled to release **Qwen-Image**, an image generation foundation model in the Qwen series that achieves significant advances in **complex text rendering** and **precise image editing**. Experiments show strong general capabilities in both image generation and editing, with exceptional performance in text rendering, especially for Chinese.
16
+
17
+ ![](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/bench.png#center)
18
+
19
+ ## News
20
+ - 2025.08.04: We released [Qwen-Image Tech Report]().
21
+ - 2025.08.04: We released Qwen-Image weights! Check at [huggingface](https://huggingface.co/Qwen/Qwen-Image) and [Modelscope](https://modelscope.cn/models/Qwen/Qwen-Image)!
22
+ - 2025.08.04: We released Qwen-Image! Check our [blog](https://qwenlm.github.io/blog/qwen-image) for more details!
23
+
24
+
25
+ ## Quick Start
26
+
27
+ Install the latest version of diffusers
28
+ ```
29
+ pip install git+https://github.com/huggingface/diffusers
30
+ ```
31
+
32
+ The following contains a code snippet illustrating how to use the model to generate images based on text prompts:
33
+
34
+ ```python
35
+ from diffusers import DiffusionPipeline
36
+ import torch
37
+
38
+ model_name = "Qwen/Qwen-Image"
39
+
40
+ # Load the pipeline
41
+ if torch.cuda.is_available():
42
+ torch_dtype = torch.bfloat16
43
+ device = "cuda"
44
+ else:
45
+ torch_dtype = torch.float32
46
+ device = "cpu"
47
+
48
+ pipe = DiffusionPipeline.from_pretrained(model_name, torch_dtype=torch_dtype)
49
+ pipe = pipe.to(device)
50
+
51
+ positive_magic = "Ultra HD, 4K, cinematic composition." # for english prompt
52
+ # positive_magic = "超清,4K,电影级构图" # for chinese prompt
53
+
54
+ # Generate image
55
+ prompt = '''A coffee shop entrance features a chalkboard sign reading "Qwen Coffee 😊 $2 per cup," with a neon light beside it displaying "通义千问". Next to it hangs a poster showing a beautiful Chinese woman, and beneath the poster is written "π≈3.1415926-53589793-23846264-33832795-02384197". Ultra HD, 4K, cinematic composition'''
56
+
57
+ negative_prompt = " "
58
+
59
+
60
+ # Generate with different aspect ratios
61
+ aspect_ratios = {
62
+ "1:1": (1328, 1328),
63
+ "16:9": (1664, 928),
64
+ "9:16": (928, 1664),
65
+ "4:3": (1472, 1140),
66
+ "3:4": (1140, 1472)
67
+ }
68
+
69
+ width, height = aspect_ratios["16:9"]
70
+
71
+ image = pipe(
72
+ prompt=prompt + positive_magic,
73
+ width=width,
74
+ height=height,
75
+ num_inference_steps=50,
76
+ true_cfg_scale=4.0,
77
+ generator=torch.Generator(device="cuda").manual_seed(42)
78
+ ).images[0]
79
+
80
+ image.save("example.png")
81
+ ```
82
+
83
+ ## Show Cases
84
+
85
+ One of its standout capabilities is high-fidelity text rendering across diverse images. Whether it’s alphabetic languages like English or logographic scripts like Chinese, Qwen-Image preserves typographic details, layout coherence, and contextual harmony with stunning accuracy. Text isn’t just overlaid—it’s seamlessly integrated into the visual fabric.
86
+
87
+ ![](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/s1.jpg#center)
88
+
89
+ Beyond text, Qwen-Image excels at general image generation with support for a wide range of artistic styles. From photorealistic scenes to impressionist paintings, from anime aesthetics to minimalist design, the model adapts fluidly to creative prompts, making it a versatile tool for artists, designers, and storytellers.
90
+
91
+ ![](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/s2.jpg#center)
92
+
93
+ When it comes to image editing, Qwen-Image goes far beyond simple adjustments. It enables advanced operations such as style transfer, object insertion or removal, detail enhancement, text editing within images, and even human pose manipulation—all with intuitive input and coherent output. This level of control brings professional-grade editing within reach of everyday users.
94
+
95
+ ![](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/s3.jpg#center)
96
+
97
+ But Qwen-Image doesn’t just create or edit—it understands. It supports a suite of image understanding tasks, including object detection, semantic segmentation, depth and edge (Canny) estimation, novel view synthesis, and super-resolution. These capabilities, while technically distinct, can all be seen as specialized forms of intelligent image editing, powered by deep visual comprehension.
98
+
99
+ ![](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/s4.jpg#center)
100
+
101
+ Together, these features make Qwen-Image not just a tool for generating pretty pictures, but a comprehensive foundation model for intelligent visual creation and manipulation—where language, layout, and imagery converge.
102
+
103
+
104
+ ## License Agreement
105
+
106
+ Qwen-Image is licensed under Apache 2.0.
107
+
108
+ ## Citation
109
+
110
+ We kindly encourage citation of our work if you find it useful.
111
+
112
+ ```bibtex
113
+ @article{qwen-image,
114
+ title={Qwen-Image Technical Report},
115
+ author={Qwen Team},
116
+ journal={arXiv preprint},
117
+ year={2025}
118
+ }
119
+ ```
120
+
121
+
122
+ ## Contact and Join Us
123
+
124
+
125
+ If you'd like to get in touch with our research team, we’d love to hear from you! Join our [Discord](https://discord.gg/z3GAxXZ9Ce) or scan the QR code to connect via our [WeChat groups](assets/wechat.png) — we’re always open to discussion and collaboration.
126
+
127
+ If you have questions about this repository, feedback to share, or want to contribute directly, we welcome your issues and pull requests on GitHub. Your contributions help make Qwen-Image better for everyone.
128
+
129
+ If you’re passionate about fundamental research, we’re hiring full-time employees (FTEs) and research interns. Don’t wait — reach out to us at [email protected]
130
+
131
+
132
+
133
+
134
+