Skywork-UniPic
Collection
Unified Autoregressive Modeling for Visual Understanding and Generation
β’
2 items
β’
Updated
β’
9
Skywork-UniPic is a unified autoregressive multimodal model with 1.5 billion parameters, capable of handling three key vision-language tasks within a single architecture:
Trained from scratch on a large-scale multimodal corpus, UniPic is designed to support a wide range of unified image-text tasks efficiently.
Skywork-UniPic achieves competitive results across a variety of vision-language tasks:
Task | Score |
---|---|
π§ GenEval | 0.86 |
πΌοΈ DPG-Bench | 85.5 |
βοΈ GEditBench-EN | 5.83 |
π§ͺ ImgEdit-Bench | 3.49 |
git clone https://github.com/SkyworkAI/UniPic
cd UniPic
conda create -n unipic python=3.10.14
conda activate unipic
pip install -r requirements.txt
export PYTHONPATH=./:$PYTHONPATH
python scripts/image_edit.py configs/models/qwen2_5_1_5b_kl16_mar_h.py \
--checkpoint checkpoint/pytorch_model.bin \
--image_size 1024 \
--image data/sample.png \
--prompt "Replace the stars with the candle." \
--output output.jpg
export PYTHONPATH=./:$PYTHONPATH
python scripts/text2image.py configs/models/qwen2_5_1_5b_kl16_mar_h.py \
--checkpoint checkpoint/pytorch_model.bin \
--image_size 1024 \
--prompt "A glossy-coated golden retriever stands on the park lawn beside a life-sized penguin statue." \
--output output.jpg
This model is released under the MIT License.