🌌 Skywork-UniPic-1.5B

📖 Introduction

Skywork-UniPic is a unified autoregressive multimodal model with 1.5 billion parameters, capable of handling three key vision-language tasks within a single architecture:

🖼️ Image Understanding
🎨 Text-to-Image Generation
✏️ Image Editing

Trained from scratch on a large-scale multimodal corpus, UniPic is designed to support a wide range of unified image-text tasks efficiently.

📊 Benchmarks

Skywork-UniPic achieves competitive results across a variety of vision-language tasks:

Task	Score
🧠 GenEval	0.86
🖼️ DPG-Bench	85.5
✂️ GEditBench-EN	5.83
🧪 ImgEdit-Bench	3.49

🧠 Usage

1. Clone the Repository

git clone https://github.com/SkyworkAI/UniPic
cd UniPic

2. Set Up the Environment

conda create -n unipic python=3.10.14
conda activate unipic
pip install -r requirements.txt

3. Image Editing

export PYTHONPATH=./:$PYTHONPATH

python scripts/image_edit.py configs/models/qwen2_5_1_5b_kl16_mar_h.py \
  --checkpoint checkpoint/pytorch_model.bin \
  --image_size 1024 \
  --image data/sample.png \
  --prompt "Replace the stars with the candle." \
  --output output.jpg

4.Text-to-Image Generation

export PYTHONPATH=./:$PYTHONPATH

python scripts/text2image.py configs/models/qwen2_5_1_5b_kl16_mar_h.py \
  --checkpoint checkpoint/pytorch_model.bin \
  --image_size 1024 \
  --prompt "A glossy-coated golden retriever stands on the park lawn beside a life-sized penguin statue." \
  --output output.jpg

📄 License

This model is released under the MIT License.

Skywork
/

Skywork-UniPic-1.5B