🌌 Skywork-UniPic-1.5B

Skywork Logo

GitHub Repo


πŸ“– Introduction

Skywork-UniPic is a unified autoregressive multimodal model with 1.5 billion parameters, capable of handling three key vision-language tasks within a single architecture:

  • πŸ–ΌοΈ Image Understanding
  • 🎨 Text-to-Image Generation
  • ✏️ Image Editing

Trained from scratch on a large-scale multimodal corpus, UniPic is designed to support a wide range of unified image-text tasks efficiently.

Model Teaser

πŸ“Š Benchmarks

Skywork-UniPic achieves competitive results across a variety of vision-language tasks:

Task Score
🧠 GenEval 0.86
πŸ–ΌοΈ DPG-Bench 85.5
βœ‚οΈ GEditBench-EN 5.83
πŸ§ͺ ImgEdit-Bench 3.49
Benchmark Results

🧠 Usage

1. Clone the Repository

git clone https://github.com/SkyworkAI/UniPic
cd UniPic

2. Set Up the Environment

conda create -n unipic python=3.10.14
conda activate unipic
pip install -r requirements.txt

3. Image Editing

export PYTHONPATH=./:$PYTHONPATH

python scripts/image_edit.py configs/models/qwen2_5_1_5b_kl16_mar_h.py \
  --checkpoint checkpoint/pytorch_model.bin \
  --image_size 1024 \
  --image data/sample.png \
  --prompt "Replace the stars with the candle." \
  --output output.jpg

4.Text-to-Image Generation

export PYTHONPATH=./:$PYTHONPATH

python scripts/text2image.py configs/models/qwen2_5_1_5b_kl16_mar_h.py \
  --checkpoint checkpoint/pytorch_model.bin \
  --image_size 1024 \
  --prompt "A glossy-coated golden retriever stands on the park lawn beside a life-sized penguin statue." \
  --output output.jpg

πŸ“„ License

This model is released under the MIT License.

Downloads last month
13
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ 1 Ask for provider support

Space using Skywork/Skywork-UniPic-1.5B 1

Collection including Skywork/Skywork-UniPic-1.5B