showlab
/

show-o2-1.5B

Model card Files Files and versions Community

Sierkinhane commited on 7 days ago

Commit

111e2f9

·

verified ·

1 Parent(s): 2fef922

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -15,7 +15,7 @@
 ## What is the new about Show-o2?
 We perform the unified learning of multimodal understanding and generation on the text token and **3D Causal VAE space**, which is scalable for **text, image, and video modalities**. A dual-path of spatial (-temporal) fusion is proposed to accommodate the distinct feature dependency  of multimodal understanding and generation. We employ specific heads with **autoregressive modeling and flow matching** for the overall unified learning of **multimodal understanding, image/video and mixed-modality generation.**
 ## Pre-trained Model Weigths
 The Show-o2 checkpoints can be found on Hugging Face:

 ## What is the new about Show-o2?
 We perform the unified learning of multimodal understanding and generation on the text token and **3D Causal VAE space**, which is scalable for **text, image, and video modalities**. A dual-path of spatial (-temporal) fusion is proposed to accommodate the distinct feature dependency  of multimodal understanding and generation. We employ specific heads with **autoregressive modeling and flow matching** for the overall unified learning of **multimodal understanding, image/video and mixed-modality generation.**
+<img src="overview.png" width="1000">
 ## Pre-trained Model Weigths
 The Show-o2 checkpoints can be found on Hugging Face: