Add model card (#1)
Browse files- Add model card (c7e43b1a9d90b0c22c5472f863508fe3c3a28f5f)
Co-authored-by: Niels Rogge <[email protected]>
README.md
CHANGED
@@ -1,12 +1,13 @@
|
|
1 |
---
|
|
|
2 |
license: mit
|
3 |
pipeline_tag: image-text-to-text
|
4 |
-
library_name: transformers
|
5 |
tags:
|
6 |
- text-to-image
|
7 |
- image-to-image
|
8 |
- image-to-text
|
9 |
---
|
|
|
10 |
<h1 align="center">Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction</h1>
|
11 |
|
12 |
Ming-Lite-Uni is an open-source multimodal framework featuring a newly designed unified visual generator and a native multimodal autoregressive model tailored for unifying vision and language. Specifically, this project provides an open-source implementation of the integrated MetaQueries and M2-omni framework, while introducing the novel **multi-scale learnable tokens** and **multi-scale representation alignment strategy**. By leveraging a fixed MLLM and a learnable diffusion model, Ming-Lite-Uni enables native multimodal AR models to perform both text-to-image generation and instruction based image editing tasks, expanding their capabilities beyond pure visual understanding. Our experimental results reveal the strong performance of Ming-Lite-Uni and illustrate the impressive fluid nature of its interactive process. Ming-Lite-Uni is in alpha stage and will soon be further refined.
|
@@ -116,4 +117,6 @@ inputs = my_proc.process(image_file=image_file, prompt=prompt, device=device)
|
|
116 |
result = model.image_gen_generate(inputs, steps=30, seed=42, cfg=5.0, height=512, width=512)[1]
|
117 |
result.save("result.png")
|
118 |
```
|
119 |
-
For more advanced usage, such as fine-tuning or generating images, refer to the documentation.
|
|
|
|
|
|
1 |
---
|
2 |
+
library_name: transformers
|
3 |
license: mit
|
4 |
pipeline_tag: image-text-to-text
|
|
|
5 |
tags:
|
6 |
- text-to-image
|
7 |
- image-to-image
|
8 |
- image-to-text
|
9 |
---
|
10 |
+
|
11 |
<h1 align="center">Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction</h1>
|
12 |
|
13 |
Ming-Lite-Uni is an open-source multimodal framework featuring a newly designed unified visual generator and a native multimodal autoregressive model tailored for unifying vision and language. Specifically, this project provides an open-source implementation of the integrated MetaQueries and M2-omni framework, while introducing the novel **multi-scale learnable tokens** and **multi-scale representation alignment strategy**. By leveraging a fixed MLLM and a learnable diffusion model, Ming-Lite-Uni enables native multimodal AR models to perform both text-to-image generation and instruction based image editing tasks, expanding their capabilities beyond pure visual understanding. Our experimental results reveal the strong performance of Ming-Lite-Uni and illustrate the impressive fluid nature of its interactive process. Ming-Lite-Uni is in alpha stage and will soon be further refined.
|
|
|
117 |
result = model.image_gen_generate(inputs, steps=30, seed=42, cfg=5.0, height=512, width=512)[1]
|
118 |
result.save("result.png")
|
119 |
```
|
120 |
+
For more advanced usage, such as fine-tuning or generating images, refer to the documentation.
|
121 |
+
|
122 |
+
Link to the code: https://github.com/inclusionAI/Ming
|