lbwang commited on
Commit
1c92716
·
verified ·
1 Parent(s): 1bd02b6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -2
README.md CHANGED
@@ -1,3 +1,12 @@
 
 
 
 
 
 
 
 
 
1
  <h1 align="center">Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction</h1>
2
 
3
  Ming-Lite-Uni is an open-source multimodal framework featuring a newly designed unified visual generator and a native multimodal autoregressive model tailored for unifying vision and language. Specifically, this project provides an open-source implementation of the integrated MetaQueries and M2-omni framework, while introducing the novel **multi-scale learnable tokens** and **multi-scale representation alignment strategy**. By leveraging a fixed MLLM and a learnable diffusion model, Ming-Lite-Uni enables native multimodal AR models to perform both text-to-image generation and instruction based image editing tasks, expanding their capabilities beyond pure visual understanding. Our experimental results reveal the strong performance of Ming-Lite-Uni and illustrate the impressive fluid nature of its interactive process. Ming-Lite-Uni is in alpha stage and will soon be further refined.
@@ -107,5 +116,4 @@ inputs = my_proc.process(image_file=image_file, prompt=prompt, device=device)
107
  result = model.image_gen_generate(inputs, steps=30, seed=42, cfg=5.0, height=512, width=512)[1]
108
  result.save("result.png")
109
  ```
110
- For more advanced usage, such as fine-tuning or generating images, refer to the documentation.
111
-
 
1
+ ---
2
+ license: mit
3
+ pipeline_tag: image-text-to-text
4
+ library_name: transformers
5
+ tags:
6
+ - text-to-image
7
+ - image-to-image
8
+ - image-to-text
9
+ ---
10
  <h1 align="center">Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction</h1>
11
 
12
  Ming-Lite-Uni is an open-source multimodal framework featuring a newly designed unified visual generator and a native multimodal autoregressive model tailored for unifying vision and language. Specifically, this project provides an open-source implementation of the integrated MetaQueries and M2-omni framework, while introducing the novel **multi-scale learnable tokens** and **multi-scale representation alignment strategy**. By leveraging a fixed MLLM and a learnable diffusion model, Ming-Lite-Uni enables native multimodal AR models to perform both text-to-image generation and instruction based image editing tasks, expanding their capabilities beyond pure visual understanding. Our experimental results reveal the strong performance of Ming-Lite-Uni and illustrate the impressive fluid nature of its interactive process. Ming-Lite-Uni is in alpha stage and will soon be further refined.
 
116
  result = model.image_gen_generate(inputs, steps=30, seed=42, cfg=5.0, height=512, width=512)[1]
117
  result.save("result.png")
118
  ```
119
+ For more advanced usage, such as fine-tuning or generating images, refer to the documentation.