m1ngcheng nielsr HF Staff commited on
Commit
f4d2fa4
·
verified ·
1 Parent(s): 1c92716

Add model card (#1)

Browse files

- Add model card (c7e43b1a9d90b0c22c5472f863508fe3c3a28f5f)


Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show
  1. README.md +5 -2
README.md CHANGED
@@ -1,12 +1,13 @@
1
  ---
 
2
  license: mit
3
  pipeline_tag: image-text-to-text
4
- library_name: transformers
5
  tags:
6
  - text-to-image
7
  - image-to-image
8
  - image-to-text
9
  ---
 
10
  <h1 align="center">Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction</h1>
11
 
12
  Ming-Lite-Uni is an open-source multimodal framework featuring a newly designed unified visual generator and a native multimodal autoregressive model tailored for unifying vision and language. Specifically, this project provides an open-source implementation of the integrated MetaQueries and M2-omni framework, while introducing the novel **multi-scale learnable tokens** and **multi-scale representation alignment strategy**. By leveraging a fixed MLLM and a learnable diffusion model, Ming-Lite-Uni enables native multimodal AR models to perform both text-to-image generation and instruction based image editing tasks, expanding their capabilities beyond pure visual understanding. Our experimental results reveal the strong performance of Ming-Lite-Uni and illustrate the impressive fluid nature of its interactive process. Ming-Lite-Uni is in alpha stage and will soon be further refined.
@@ -116,4 +117,6 @@ inputs = my_proc.process(image_file=image_file, prompt=prompt, device=device)
116
  result = model.image_gen_generate(inputs, steps=30, seed=42, cfg=5.0, height=512, width=512)[1]
117
  result.save("result.png")
118
  ```
119
- For more advanced usage, such as fine-tuning or generating images, refer to the documentation.
 
 
 
1
  ---
2
+ library_name: transformers
3
  license: mit
4
  pipeline_tag: image-text-to-text
 
5
  tags:
6
  - text-to-image
7
  - image-to-image
8
  - image-to-text
9
  ---
10
+
11
  <h1 align="center">Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction</h1>
12
 
13
  Ming-Lite-Uni is an open-source multimodal framework featuring a newly designed unified visual generator and a native multimodal autoregressive model tailored for unifying vision and language. Specifically, this project provides an open-source implementation of the integrated MetaQueries and M2-omni framework, while introducing the novel **multi-scale learnable tokens** and **multi-scale representation alignment strategy**. By leveraging a fixed MLLM and a learnable diffusion model, Ming-Lite-Uni enables native multimodal AR models to perform both text-to-image generation and instruction based image editing tasks, expanding their capabilities beyond pure visual understanding. Our experimental results reveal the strong performance of Ming-Lite-Uni and illustrate the impressive fluid nature of its interactive process. Ming-Lite-Uni is in alpha stage and will soon be further refined.
 
117
  result = model.image_gen_generate(inputs, steps=30, seed=42, cfg=5.0, height=512, width=512)[1]
118
  result.save("result.png")
119
  ```
120
+ For more advanced usage, such as fine-tuning or generating images, refer to the documentation.
121
+
122
+ Link to the code: https://github.com/inclusionAI/Ming